Flurex animal, PFOS and SCFA data

1 INFO

This document contains the commands necessary to analyse experimental data obtained from Flurex (internal project name: R20-22). The project data contains: - Animal weight data including calculated weights per bw and normalized weight data in decimals: + body weight (bw) from day 0 to day 8, including bw gain from day 0 - 8 + liver and cecum weight from dissection on day 8

  • PFOS quantitative data:
    • total dosed PFOS per rat on day 4 and 8 respectively (mg)
    • blood day 4 and 8 in ug PFOS/mL serum including calculations on
      • total blood volume per animal based on an standard average of 64mL blood / kilogram in rats “Diehl et al. 2001”
      • concentrations of PFOS on each day (ug/mL)
      • total PFOS in blood volume (mg)
      • total PFOS detected in blood from total dosed per day (pct)
    • liver from dissection on day 8 in ug PFOS/g tissue including calculations on
      • total PFOS in liver per rat
      • concentration of PFOS (ug/mL)
      • total PFOS detected in liver from total dosed on day 8 (pct)
    • isomer proportions of branched and linear PFOS presented as branched-linear ratio (bl-ratio)
  • Short-chain fatty acids quantification of 10 compounds in colonic water given in mM from day 8:
    • acetic acid (acetate)
    • formic acid (formate)
    • propanoic acid (propionate)
    • 2-methyl-propanoic acid (isobutyrate)
    • butanoic acid (butyrate)
    • 3-methyl-butanoic acid (isovalerate)
    • pentanoic acid (valerate)
    • 4-methyl-pentanoic acid (isocaproate)
    • hexanoic acid (caproate)
    • heptanoic acid (enanthate)

2 Setup

Following code loads packages, creates necessary folder and saves parameters for the following analyses.

knitr::opts_chunk$set(echo = TRUE)

# Load libraries
library(tidyverse)
library(phyloseq)
library(decontam)
library(pals)
library(ggpubr)
library(vegan)
library(phangorn)
library(kableExtra)
library(plotly)
library(rstatix)
library(forcats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggbreak)
library(ggrepel)
library(DAtest)
library(cowplot)
library(pheatmap)

# Create used folders if missing
if (!file.exists("R_objects")) dir.create(file.path(getwd(), "R_objects"))
if (!file.exists("plots")) dir.create(file.path(getwd(), "plots"))
if (!file.exists("plots/animal_data")) dir.create(file.path(getwd(), "plots/animal_data"))
if (!file.exists("scripts")) dir.create(file.path(getwd(), "scripts"))

# Save params
saveRDS(params, file = "R_objects/animal_params.RDS")

3 LOAD DATA

Loading data from CSV-format and saves as Rdata-format.

params <- readRDS("R_objects/animal_params.RDS")
## Error in eval(expr, envir, enclos): cannot change value of locked binding for 'params'
# Load analysis data
dat <- read.csv(params$input, header = TRUE, sep = ";", dec = ",")

save(dat, file = "R_objects/animal_data.Rdata")

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

4 ANIMAL WEIGHT DATA

Animal weight data contains data from body weight through the entire study period with calculated body weight gain, and organ weights from cecum and liver.

4.1 Body weight gain

This section will prepare to perform the data analysis for body weight gain

4.1.1 Statistics

4.1.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- dat
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bw_gain"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      bw_gain     12  17.1  2.26
## 2 PFOS      bw_gain     12  17.2  4.04
## 3 VAN       bw_gain     12  17.6  2.69
## 4 VAN+PFOS  bw_gain     12  17.5  2.65

4.1.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 PFOS      R30            18 yes   no     258.  262.  265.  271.  270.   274
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.951  0.0457

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.530 0.664
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers, has equal variance and is normally distributed without the outliers according to Shapiro-Wilk test. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.1.1.3 ANOVA One-Way test

4.1.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd     F     p p<.05   ges
## 1 treatment   3  44 0.064 0.979       0.004
4.1.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term   group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr>  <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 treat… CTRL   PFOS            0   0.111     -3.15      3.37 1     ns          
## 2 treat… CTRL   VAN             0   0.463     -2.79      3.72 0.981 ns          
## 3 treat… CTRL   VAN+P…          0   0.374     -2.88      3.63 0.99  ns          
## 4 treat… PFOS   VAN             0   0.353     -2.90      3.61 0.991 ns          
## 5 treat… PFOS   VAN+P…          0   0.264     -2.99      3.52 0.996 ns          
## 6 treat… VAN    VAN+P…          0  -0.0893    -3.35      3.17 1     ns

4.1.2 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "Bodyweight gain",limits = c(5,25),breaks = seq(5,25,5), labels = function(x) paste0(x, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

# Output plot
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

Body weight gain from day 0 to day 8 ## Cecum weight (grams) This section will prepare to perform the data analysis for cecum weight data in grams

4.1.3 Statistics

4.1.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_wt"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      cecum_wt    11  5.08  1.13
## 2 PFOS      cecum_wt    12  4.70  1.23
## 3 VAN       cecum_wt    12  9.41  1.27
## 4 VAN+PFOS  cecum_wt    11  9.97  1.14

4.1.3.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 6 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 PFOS      R25            13 yes   no     339.  340.  353.  364.  348.   358
## 3 VAN       R15            27 no    yes    268.  277.  283.  290.  296.   300
## 4 VAN+PFOS  R43            43 yes   yes    292.  301.  300.  313.  316.   322
## 5 VAN+PFOS  R44            44 yes   yes    261.  269.  277   284.  287.   296
## 6 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains six not critical outliers.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.954  0.0655

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    42    0.0879 0.966
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised cecum weight data has six non-critical outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.1.3.3 ANOVA One-Way test

4.1.3.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(cecum_wt ~ pfos*van)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##     Effect DFn DFd       F        p p<.05   ges
## 1     pfos   1  42   0.066 7.99e-01       0.002
## 2      van   1  42 185.208 5.41e-17     * 0.815
## 3 pfos:van   1  42   1.730 1.96e-01       0.040
4.1.3.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term      group1 group2   null.value estimate conf.low conf.high    p.adj
## * <chr>     <chr>  <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 treatment CTRL   PFOS              0   -0.374   -1.71      0.961 8.77e- 1
## 2 treatment CTRL   VAN               0    4.34     3.00      5.67  3.69e-10
## 3 treatment CTRL   VAN+PFOS          0    4.89     3.53      6.26  2.36e-11
## 4 treatment PFOS   VAN               0    4.71     3.41      6.02  2   e-11
## 5 treatment PFOS   VAN+PFOS          0    5.27     3.93      6.60  2.39e-12
## 6 treatment VAN    VAN+PFOS          0    0.554   -0.780     1.89  6.85e- 1
## # ℹ 1 more variable: p.adj.signif <chr>

4.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "grams",limits = c(0,15),breaks = seq(0,15,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(12,14,13,15))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Normalized cecum weights at day 8
Normalized cecum weights at day 8

4.2 Cecum weight (normalized)

This section will prepare to perform the data analysis for normalized cecum weight data

4.2.1 Statistics

4.2.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_norm"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable       n  mean    sd
##   <chr>     <fct>      <dbl> <dbl> <dbl>
## 1 CTRL      cecum_norm    11 1     0.148
## 2 PFOS      cecum_norm    12 0.944 0.164
## 3 VAN       cecum_norm    12 1.88  0.24 
## 4 VAN+PFOS  cecum_norm    11 2.07  0.201

4.2.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R15            27 no    yes    268.  277.  283.  290.  296.   300
## 2 VAN       R24            36 no    yes    281.  286.  294.  305.  309.   312
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.984   0.753

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    42     0.416 0.742
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised cecum weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.2.1.3 ANOVA One-Way test

4.2.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd       F        p p<.05   ges
## 1 treatment   3  42 106.226 1.21e-19     * 0.884
4.2.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term      group1 group2   null.value estimate conf.low conf.high    p.adj
## * <chr>     <chr>  <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 treatment CTRL   PFOS              0  -0.0562  -0.271      0.158 8.96e- 1
## 2 treatment CTRL   VAN               0   0.884    0.669      1.10  1.42e-12
## 3 treatment CTRL   VAN+PFOS          0   1.07     0.850      1.29  1.06e-12
## 4 treatment PFOS   VAN               0   0.940    0.730      1.15  1.09e-12
## 5 treatment PFOS   VAN+PFOS          0   1.13     0.911      1.34  1.06e-12
## 6 treatment VAN    VAN+PFOS          0   0.186   -0.0287     0.400 1.1 e- 1
## # ℹ 1 more variable: p.adj.signif <chr>

4.2.2 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% difference",limits = c(0.5,3.1),breaks = seq(0.5,3.1,0.5), labels = function(x) paste0(x*100, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.2,2.8,2.5,3.1))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Normalized cecum weights at day 8
Normalized cecum weights at day 8

4.3 Liver weight (grams)

This section will prepare to perform the data analysis for liver weight data in grams

4.3.1 Statistics

4.3.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_wt"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      liver_wt    12  11.2 1.12 
## 2 PFOS      liver_wt    12  12.0 1.51 
## 3 VAN       liver_wt    12  10.5 0.973
## 4 VAN+PFOS  liver_wt    12  11.3 1.47

4.3.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no      310  321.  325.  339.   350   354
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains one non-critical outlier.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.989   0.922

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44      1.50 0.227
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised liver weight data has one non-critical outlier, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.3.1.3 ANOVA One-Way test

4.3.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(liver_wt ~ pfos*van) #FORMULA
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##     Effect DFn DFd        F     p p<.05      ges
## 1     pfos   1  44 4.977000 0.031     * 1.02e-01
## 2      van   1  44 3.492000 0.068       7.40e-02
## 3 pfos:van   1  44 0.000905 0.976       2.06e-05
4.3.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA) #FORMULA
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term  group1 group2 null.value estimate conf.low conf.high  p.adj p.adj.signif
## * <chr> <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl>  <dbl> <chr>       
## 1 trea… CTRL   PFOS            0    0.820   -0.587     2.23  0.414  ns          
## 2 trea… CTRL   VAN             0   -0.708   -2.11      0.699 0.541  ns          
## 3 trea… CTRL   VAN+P…          0    0.135   -1.27      1.54  0.994  ns          
## 4 trea… PFOS   VAN             0   -1.53    -2.93     -0.121 0.0287 *           
## 5 trea… PFOS   VAN+P…          0   -0.685   -2.09      0.722 0.568  ns          
## 6 trea… VAN    VAN+P…          0    0.843   -0.565     2.25  0.39   ns

4.3.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "µg/g") + #,limits = c(8,17),breaks = seq(8,17,2)
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE) #, y.position = c(1.35,1.4,1.45,1.5)
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Liver weights at day 8
Liver weights at day 8

4.4 Liver weight (normalized)

This section will prepare to perform the data analysis for normalized liver weight data

4.4.1 Statistics

4.4.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_norm"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable       n  mean    sd
##   <chr>     <fct>      <dbl> <dbl> <dbl>
## 1 CTRL      liver_norm    12 1     0.055
## 2 PFOS      liver_norm    12 1.08  0.055
## 3 VAN       liver_norm    12 0.934 0.044
## 4 VAN+PFOS  liver_norm    12 1.03  0.065

4.4.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Data contains zero outliers.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.985   0.778

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.430 0.733
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that normalised liver weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.

4.4.1.3 ANOVA One-Way test

4.4.1.3.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd      F        p p<.05  ges
## 1 treatment   3  44 15.909 3.79e-07     * 0.52
4.4.1.3.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term      group1 group2   null.value estimate conf.low conf.high       p.adj
## * <chr>     <chr>  <chr>         <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
## 1 treatment CTRL   PFOS              0   0.0846   0.0246   0.145   0.0027     
## 2 treatment CTRL   VAN               0  -0.0663  -0.126   -0.00623 0.0254     
## 3 treatment CTRL   VAN+PFOS          0   0.0352  -0.0249   0.0953  0.409      
## 4 treatment PFOS   VAN               0  -0.151   -0.211   -0.0909  0.000000182
## 5 treatment PFOS   VAN+PFOS          0  -0.0494  -0.110    0.0107  0.14       
## 6 treatment VAN    VAN+PFOS          0   0.102    0.0415   0.162   0.000269   
## # ℹ 1 more variable: p.adj.signif <chr>

4.4.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% difference",limits = c(0.75,1.5),breaks = seq(0.75,1.5,0.25), labels = function(x) paste0(x*100, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(1.35,1.4,1.45,1.5))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Normalized liver weights at day 8
Normalized liver weights at day 8

5 PFOS QUANTITATIVE DATA

Following section handles data analysis of PFOS from serum and liver samples (Run on Dionex Ultimate 3000 / Bruker EVOQ Elite UPLC-MS/MS against linear PPOS standard curve and with internal MPFOS standard).

5.1 Blood serum day 4

This section will prepare to perform the data analysis for PFOS data from serum on day 4.

5.1.1 ug/mL in serum

5.1.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum4_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum4_ugml"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable             n   mean    sd
##   <chr>     <fct>            <dbl>  <dbl> <dbl>
## 1 CTRL      pfos_serum4_ugml    11  0     0.001
## 2 PFOS      pfos_serum4_ugml    12  9.17  2.01 
## 3 VAN       pfos_serum4_ugml    11  0.001 0.001
## 4 VAN+PFOS  pfos_serum4_ugml    10 10.0   1.18

5.1.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R06             6 no    no     274.  276.  285.  296.  297.   302
## 2 CTRL      R11            11 no    no     271.  278.  287.  294.  293.   299
## 3 PFOS      R29            17 yes   no     239.  243.  248.  255.  259.   267
## 4 VAN       R13            25 no    yes    218.  222.  228.  234   231.   241
## 5 VAN       R21            33 no    yes    262.  268.  274.  285.  281.   291
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.807 0.00000412

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    40      7.48 0.000436
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.1.1.3 Kruskal-Wallis test

5.1.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                  n statistic    df           p method        
## * <chr>            <int>     <dbl> <int>       <dbl> <chr>         
## 1 pfos_serum4_ugml    44      35.4     3 0.000000101 Kruskal-Wallis
5.1.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                  n effsize method  magnitude
## * <chr>            <int>   <dbl> <chr>   <ord>    
## 1 pfos_serum4_ugml    44   0.809 eta2[H] large
5.1.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_serum4_… CTRL   PFOS      11    12    3.86   1.15e-4 1.85e-4 ***         
## 2 pfos_serum4_… CTRL   VAN       11    11    0.0172 9.86e-1 9.86e-1 ns          
## 3 pfos_serum4_… CTRL   VAN+P…    11    10    4.53   5.87e-6 1.91e-5 ****        
## 4 pfos_serum4_… PFOS   VAN       12    11   -3.84   1.23e-4 1.85e-4 ***         
## 5 pfos_serum4_… PFOS   VAN+P…    12    10    0.863  3.88e-1 4.66e-1 ns          
## 6 pfos_serum4_… VAN    VAN+P…    11    10    4.51   6.36e-6 1.91e-5 ****

5.1.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/mL",limits = c(0,20),breaks = seq(0,20,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(14,17,15,14))
p
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS conc. in ug/mL at day 4
PFOS conc. in ug/mL at day 4

5.1.2 Total mg in serum

5.1.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.1.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_serum4_mg     0.955 0.717
## 2 VAN+PFOS  pfos_serum4_mg     0.930 0.446
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20    0.0274 0.870
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.1.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1  -0.0171     0.165     0.183 pfos_s… PFOS   VAN+P…    12    10     -1.23 0.232
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum4_mg PFOS   VAN+PFOS  -0.528    12    10 moderate

5.1.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,0.30),breaks = seq(0,0.30,0.1)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.28))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Total mg PFOS in blood volume
Total mg PFOS in blood volume

5.1.3 Pct.

Data for PFOS levels in serum at day 4 calculated from the total PFOS dosed at the time point. #### Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.1.3.1 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R29            17 yes   no     239.  243.  248.  255.  259.   267
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable        statistic     p
##   <chr>     <chr>               <dbl> <dbl>
## 1 PFOS      pfos_serum4_pct     0.937 0.456
## 2 VAN+PFOS  pfos_serum4_pct     0.939 0.547
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20      1.43 0.246
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

5.1.3.2 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   -0.626      6.75      7.37 pfos_s… PFOS   VAN+P…    12    10     -1.17 0.254
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.             group1 group2   effsize    n1    n2 magnitude
## * <chr>           <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum4_pct PFOS   VAN+PFOS  -0.503    12    10 moderate

5.1.3.3 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(3,10),breaks = seq(3,10,1)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS serum day 4 in pct. of total
PFOS serum day 4 in pct. of total

5.2 Blood serum day 8

This section will prepare to perform the data analysis for PFOS data from serum on day 8.

5.2.1 ug/µL in serum

5.2.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum8_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum8_ugml"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable             n   mean     sd
##   <chr>     <fct>            <dbl>  <dbl>  <dbl>
## 1 CTRL      pfos_serum8_ugml    12  0.016  0.03 
## 2 PFOS      pfos_serum8_ugml    12 36.3   15.5  
## 3 VAN       pfos_serum8_ugml    12  0.011  0.021
## 4 VAN+PFOS  pfos_serum8_ugml    12 32.2   10.7

5.2.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 7 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R05             5 no    no     214   219.  222.  229.  231.   237
## 2 CTRL      R09             9 no    no     273.  278.  290.  290.  296.   302
## 3 CTRL      R12            12 no    no     297.  304.  316.  322.  324.   334
## 4 VAN       R14            26 no    yes    246.  256.  260.  267.  270.   274
## 5 VAN       R16            28 no    yes    256.  260   268.  275.  273    279
## 6 VAN       R24            36 no    yes    281.  286.  294.  305.  309.   312
## 7 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.823 0.00000461

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic         p
##   <int> <int>     <dbl>     <dbl>
## 1     3    44      8.92 0.0000993
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.2.1.3 Kruskal-Wallis test

5.2.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                  n statistic    df            p method        
## * <chr>            <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_serum8_ugml    48      37.3     3 0.0000000402 Kruskal-Wallis
5.2.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                  n effsize method  magnitude
## * <chr>            <int>   <dbl> <chr>   <ord>    
## 1 pfos_serum8_ugml    48   0.779 eta2[H] large
5.2.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_serum8_… CTRL   PFOS      12    12    4.37   1.26e-5 3.78e-5 ****        
## 2 pfos_serum8_… CTRL   VAN       12    12   -0.0899 9.28e-1 9.28e-1 ns          
## 3 pfos_serum8_… CTRL   VAN+P…    12    12    4.17   3.02e-5 4.53e-5 ****        
## 4 pfos_serum8_… PFOS   VAN       12    12   -4.46   8.32e-6 3.78e-5 ****        
## 5 pfos_serum8_… PFOS   VAN+P…    12    12   -0.195  8.46e-1 9.28e-1 ns          
## 6 pfos_serum8_… VAN    VAN+P…    12    12    4.26   2.03e-5 4.05e-5 ****

5.2.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/mL",limits = c(0,80),breaks = seq(0,80,10)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(72,80,75,72))
p
## Warning: Removed 11 rows containing missing values (`geom_point()`).

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 11 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 11 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS conc. in ug/mL at day 8
PFOS conc. in ug/mL at day 8

5.2.2 Total mg in serum

5.2.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.2.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_serum8_mg     0.890 0.119
## 2 VAN+PFOS  pfos_serum8_mg     0.902 0.168
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22      1.98 0.173
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.2.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   0.0931     0.717     0.624 pfos_s… PFOS   VAN+P…    12    12     0.853 0.403
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum8_mg PFOS   VAN+PFOS   0.348    12    12 small

5.2.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,2),breaks = seq(0,2,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(1.75))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Total mg PFOS in blood volume
Total mg PFOS in blood volume

5.2.3 Pct.

5.2.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes" & !rat_name == "R47")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.2.3.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable        statistic      p
##   <chr>     <chr>               <dbl>  <dbl>
## 1 PFOS      pfos_serum8_pct     0.867 0.0594
## 2 VAN+PFOS  pfos_serum8_pct     0.899 0.181
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    21      2.86 0.106
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.2.3.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1     2.05      11.8      9.74 pfos_s… PFOS   VAN+P…    12    11      1.29 0.212
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.             group1 group2   effsize    n1    n2 magnitude
## * <chr>           <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_serum8_pct PFOS   VAN+PFOS   0.537    12    11 moderate

5.2.3.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(5,25),breaks = seq(5,25,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(24))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS serum day 8 in pct. of total
PFOS serum day 8 in pct. of total

5.3 Blood serum day 4 and 8

This section will prepare to perform the data analysis for PFOS data from serum on day 4 and 8 collected.

5.3.1 Change from day 4 to 8 (Pct.)

5.3.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_change48_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes") # add following to subset() to remove the outliers: & !rat_name %in% c("R47","R27"))

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_change48_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.3.1.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R27            15 yes   no     270.  280.  284.  291.  290.   296
## 2 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers, compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable          statistic      p
##   <chr>     <chr>                 <dbl>  <dbl>
## 1 PFOS      pfos_change48_pct     0.894 0.132 
## 2 VAN+PFOS  pfos_change48_pct     0.805 0.0167
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    20     0.639 0.434
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

Two outliers were identified (sample for R27 and R47). Analysis result and test method is similar with and without outliers. Data is normally distributed and has equal variance. Hence we use t-test.

5.3.1.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1     85.1      341.      256. pfos_c… PFOS   VAN+P…    12    10      1.11 0.281
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.               group1 group2   effsize    n1    n2 magnitude
## * <chr>             <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_change48_pct PFOS   VAN+PFOS   0.475    12    10 small

5.3.1.4 Conclusion

5.3.1.5 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create point plot with mean and SD
data_summary <- function(x) {
  m <- mean(x)
  ymin <- m-sd(x)
  ymax <- m+sd(x)
  return(c(y=m,ymin=ymin,ymax=ymax))
}
data_summary_collapsed <- function(x) {
  m <- mean(x)
  ymin <- m
  ymax <- m
  return(c(y=m,ymin=ymin,ymax=ymax))
}

p <- ggplot(dat.clean, aes(x = .data[[PREDICTOR]], y = .data[[OUTCOME]], color = .data[[PREDICTOR]])) +
  stat_summary(fun.data = data_summary_collapsed, geom = "crossbar", color = "black", width = 0.5, linewidth = 0.3) +
  stat_summary(fun.data = data_summary, geom = "errorbar", color = "black", width = 0.15, linewidth = 0.5) +
  geom_point(position = position_jitterdodge(dodge.width = 0.6, jitter.width = 0.4), size = 2, colour = "black", shape = 21, stroke = 0.5, aes(fill = treatment)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100), labels = function(x) paste0(x, "%")) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme_pubr()
p

# Alternative: Create boxplot
# p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
#           fill = PREDICTOR,
#           add =  "jitter",
#           add.params = list(size = 1)) +
#   scale_fill_manual(values = params$COL) +
#   scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100)) +
#   labs(fill = "Treatment") +
#   scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

p

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS serum level change from day 4 to 8 in pct.
PFOS serum level change from day 4 to 8 in pct.

5.3.2 Data ug/mL

5.3.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")

# Subset data
dat.sub <- subset(dat, pfos == "yes")

# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_ugml, pfos_serum8_ugml) %>%
  pivot_longer(., cols = c(pfos_serum4_ugml, pfos_serum8_ugml), names_to = "data_group", values_to = "conc")

# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_ugml","d8","d4"))

# Create ID column for easier handling
for (i in dat.sub$rat_name) {
  dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}

# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(conc))
dat.clean
##    rat_name treatment       data_group  conc day            ID
## 1       R25      PFOS pfos_serum4_ugml  8.56  d4     d4 _ PFOS
## 3       R26      PFOS pfos_serum4_ugml  9.60  d4     d4 _ PFOS
## 5       R27      PFOS pfos_serum4_ugml  7.92  d4     d4 _ PFOS
## 7       R28      PFOS pfos_serum4_ugml  8.64  d4     d4 _ PFOS
## 9       R29      PFOS pfos_serum4_ugml 12.96  d4     d4 _ PFOS
## 11      R30      PFOS pfos_serum4_ugml  8.68  d4     d4 _ PFOS
## 13      R31      PFOS pfos_serum4_ugml  5.72  d4     d4 _ PFOS
## 15      R32      PFOS pfos_serum4_ugml  7.56  d4     d4 _ PFOS
## 17      R33      PFOS pfos_serum4_ugml  8.24  d4     d4 _ PFOS
## 19      R34      PFOS pfos_serum4_ugml 11.36  d4     d4 _ PFOS
## 21      R35      PFOS pfos_serum4_ugml  9.00  d4     d4 _ PFOS
## 23      R36      PFOS pfos_serum4_ugml 11.84  d4     d4 _ PFOS
## 25      R37  VAN+PFOS pfos_serum4_ugml  9.36  d4 d4 _ VAN+PFOS
## 27      R38  VAN+PFOS pfos_serum4_ugml 11.76  d4 d4 _ VAN+PFOS
## 29      R39  VAN+PFOS pfos_serum4_ugml 10.64  d4 d4 _ VAN+PFOS
## 31      R40  VAN+PFOS pfos_serum4_ugml 12.12  d4 d4 _ VAN+PFOS
## 33      R41  VAN+PFOS pfos_serum4_ugml  9.12  d4 d4 _ VAN+PFOS
## 35      R42  VAN+PFOS pfos_serum4_ugml  9.80  d4 d4 _ VAN+PFOS
## 37      R43  VAN+PFOS pfos_serum4_ugml 10.08  d4 d4 _ VAN+PFOS
## 43      R46  VAN+PFOS pfos_serum4_ugml  9.88  d4 d4 _ VAN+PFOS
## 45      R47  VAN+PFOS pfos_serum4_ugml  8.80  d4 d4 _ VAN+PFOS
## 47      R48  VAN+PFOS pfos_serum4_ugml  8.60  d4 d4 _ VAN+PFOS
## 2       R25      PFOS pfos_serum8_ugml 43.92  d8     d8 _ PFOS
## 4       R26      PFOS pfos_serum8_ugml 45.60  d8     d8 _ PFOS
## 6       R27      PFOS pfos_serum8_ugml 69.92  d8     d8 _ PFOS
## 8       R28      PFOS pfos_serum8_ugml 21.92  d8     d8 _ PFOS
## 10      R29      PFOS pfos_serum8_ugml 26.56  d8     d8 _ PFOS
## 12      R30      PFOS pfos_serum8_ugml 47.84  d8     d8 _ PFOS
## 14      R31      PFOS pfos_serum8_ugml 21.84  d8     d8 _ PFOS
## 16      R32      PFOS pfos_serum8_ugml 29.36  d8     d8 _ PFOS
## 18      R33      PFOS pfos_serum8_ugml 22.08  d8     d8 _ PFOS
## 20      R34      PFOS pfos_serum8_ugml 52.48  d8     d8 _ PFOS
## 22      R35      PFOS pfos_serum8_ugml 29.84  d8     d8 _ PFOS
## 24      R36      PFOS pfos_serum8_ugml 23.76  d8     d8 _ PFOS
## 26      R37  VAN+PFOS pfos_serum8_ugml 36.08  d8 d8 _ VAN+PFOS
## 28      R38  VAN+PFOS pfos_serum8_ugml 31.28  d8 d8 _ VAN+PFOS
## 30      R39  VAN+PFOS pfos_serum8_ugml 25.92  d8 d8 _ VAN+PFOS
## 32      R40  VAN+PFOS pfos_serum8_ugml 23.92  d8 d8 _ VAN+PFOS
## 34      R41  VAN+PFOS pfos_serum8_ugml 21.68  d8 d8 _ VAN+PFOS
## 36      R42  VAN+PFOS pfos_serum8_ugml 40.96  d8 d8 _ VAN+PFOS
## 38      R43  VAN+PFOS pfos_serum8_ugml 25.60  d8 d8 _ VAN+PFOS
## 40      R44  VAN+PFOS pfos_serum8_ugml 37.44  d8 d8 _ VAN+PFOS
## 42      R45  VAN+PFOS pfos_serum8_ugml 25.36  d8 d8 _ VAN+PFOS
## 44      R46  VAN+PFOS pfos_serum8_ugml 34.72  d8 d8 _ VAN+PFOS
## 46      R47  VAN+PFOS pfos_serum8_ugml 59.52  d8 d8 _ VAN+PFOS
## 48      R48  VAN+PFOS pfos_serum8_ugml 23.76  d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "conc"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   ID            variable     n  mean    sd
##   <chr>         <fct>    <dbl> <dbl> <dbl>
## 1 d4 _ PFOS     conc        12  9.17  2.01
## 2 d4 _ VAN+PFOS conc        10 10.0   1.18
## 3 d8 _ PFOS     conc        12 36.3  15.5 
## 4 d8 _ VAN+PFOS conc        12 32.2  10.7

5.3.2.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = COL)
bxp

5.3.2.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 8
##   ID            rat_name treatment data_group   conc day   is.outlier is.extreme
##   <chr>         <chr>    <chr>     <chr>       <dbl> <chr> <lgl>      <lgl>     
## 1 d4 _ PFOS     R29      PFOS      pfos_serum…  13.0 d4    TRUE       FALSE     
## 2 d8 _ VAN+PFOS R47      VAN+PFOS  pfos_serum…  59.5 d8    TRUE       FALSE

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.875 0.000151

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic       p
##   <int> <int>     <dbl>   <dbl>
## 1     3    42      6.46 0.00108
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.3.2.4 Kruskal-Wallis test

5.3.2.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.       n statistic    df           p method        
## * <chr> <int>     <dbl> <int>       <dbl> <chr>         
## 1 conc     46      34.4     3 0.000000165 Kruskal-Wallis
5.3.2.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 conc     46   0.747 eta2[H] large
5.3.2.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.   group1        group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr> <chr>         <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 conc  d4 _ PFOS     d4 _ V…    12    10     0.798 4.25e-1 5.10e-1 ns          
## 2 conc  d4 _ PFOS     d8 _ P…    12    12     4.68  2.92e-6 1.75e-5 ****        
## 3 conc  d4 _ PFOS     d8 _ V…    12    12     4.48  7.51e-6 2.25e-5 ****        
## 4 conc  d4 _ VAN+PFOS d8 _ P…    10    12     3.66  2.51e-4 5.02e-4 ***         
## 5 conc  d4 _ VAN+PFOS d8 _ V…    10    12     3.47  5.15e-4 7.73e-4 ***         
## 6 conc  d8 _ PFOS     d8 _ V…    12    12    -0.198 8.43e-1 8.43e-1 ns

5.3.2.5 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
  scale_y_continuous(name = "ug/mL",limits = c(0,85),breaks = seq(0,85,10)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment", labels = c("PFOS\nDay 4","VAN+PFOS\nDay 4","PFOS\nDay 8","VAN+PFOS\nDay 8")) +
  theme(axis.title.x = element_blank())

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(75,85,70,80))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS conc. in ug/mL at day 8
PFOS conc. in ug/mL at day 8

5.3.3 Data mg

5.3.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")

# Subset data
dat.sub <- subset(dat, pfos == "yes")

# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_mg, pfos_serum8_mg) %>%
  pivot_longer(., cols = c(pfos_serum4_mg, pfos_serum8_mg), names_to = "data_group", values_to = "mg")

# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_mg","d8","d4"))

# Create ID column for easier handling
for (i in dat.sub$rat_name) {
  dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}

# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(mg))
dat.clean
##    rat_name treatment     data_group        mg day            ID
## 1       R25      PFOS pfos_serum4_mg 0.1904292  d4     d4 _ PFOS
## 3       R26      PFOS pfos_serum4_mg 0.1592525  d4     d4 _ PFOS
## 5       R27      PFOS pfos_serum4_mg 0.1472486  d4     d4 _ PFOS
## 7       R28      PFOS pfos_serum4_mg 0.1480827  d4     d4 _ PFOS
## 9       R29      PFOS pfos_serum4_mg 0.2149079  d4     d4 _ PFOS
## 11      R30      PFOS pfos_serum4_mg 0.1498237  d4     d4 _ PFOS
## 13      R31      PFOS pfos_serum4_mg 0.1123500  d4     d4 _ PFOS
## 15      R32      PFOS pfos_serum4_mg 0.1359590  d4     d4 _ PFOS
## 17      R33      PFOS pfos_serum4_mg 0.1546747  d4     d4 _ PFOS
## 19      R34      PFOS pfos_serum4_mg 0.2121503  d4     d4 _ PFOS
## 21      R35      PFOS pfos_serum4_mg 0.1792512  d4     d4 _ PFOS
## 23      R36      PFOS pfos_serum4_mg 0.1811804  d4     d4 _ PFOS
## 25      R37  VAN+PFOS pfos_serum4_mg 0.1927112  d4 d4 _ VAN+PFOS
## 27      R38  VAN+PFOS pfos_serum4_mg 0.2344474  d4 d4 _ VAN+PFOS
## 29      R39  VAN+PFOS pfos_serum4_mg 0.1721467  d4 d4 _ VAN+PFOS
## 31      R40  VAN+PFOS pfos_serum4_mg 0.2381338  d4 d4 _ VAN+PFOS
## 33      R41  VAN+PFOS pfos_serum4_mg 0.1657651  d4 d4 _ VAN+PFOS
## 35      R42  VAN+PFOS pfos_serum4_mg 0.1652672  d4 d4 _ VAN+PFOS
## 37      R43  VAN+PFOS pfos_serum4_mg 0.2037289  d4 d4 _ VAN+PFOS
## 43      R46  VAN+PFOS pfos_serum4_mg 0.1702838  d4 d4 _ VAN+PFOS
## 45      R47  VAN+PFOS pfos_serum4_mg 0.1501491  d4 d4 _ VAN+PFOS
## 47      R48  VAN+PFOS pfos_serum4_mg 0.1332518  d4 d4 _ VAN+PFOS
## 2       R25      PFOS pfos_serum8_mg 1.0930000  d8     d8 _ PFOS
## 4       R26      PFOS pfos_serum8_mg 0.8230000  d8     d8 _ PFOS
## 6       R27      PFOS pfos_serum8_mg 1.3690000  d8     d8 _ PFOS
## 8       R28      PFOS pfos_serum8_mg 0.3980000  d8     d8 _ PFOS
## 10      R29      PFOS pfos_serum8_mg 0.4810000  d8     d8 _ PFOS
## 12      R30      PFOS pfos_serum8_mg 0.8450000  d8     d8 _ PFOS
## 14      R31      PFOS pfos_serum8_mg 0.4720000  d8     d8 _ PFOS
## 16      R32      PFOS pfos_serum8_mg 0.5750000  d8     d8 _ PFOS
## 18      R33      PFOS pfos_serum8_mg 0.4550000  d8     d8 _ PFOS
## 20      R34      PFOS pfos_serum8_mg 1.0550000  d8     d8 _ PFOS
## 22      R35      PFOS pfos_serum8_mg 0.6510000  d8     d8 _ PFOS
## 24      R36      PFOS pfos_serum8_mg 0.3890000  d8     d8 _ PFOS
## 26      R37  VAN+PFOS pfos_serum8_mg 0.7850000  d8 d8 _ VAN+PFOS
## 28      R38  VAN+PFOS pfos_serum8_mg 0.6670000  d8 d8 _ VAN+PFOS
## 30      R39  VAN+PFOS pfos_serum8_mg 0.4530000  d8 d8 _ VAN+PFOS
## 32      R40  VAN+PFOS pfos_serum8_mg 0.5010000  d8 d8 _ VAN+PFOS
## 34      R41  VAN+PFOS pfos_serum8_mg 0.4270000  d8 d8 _ VAN+PFOS
## 36      R42  VAN+PFOS pfos_serum8_mg 0.7600000  d8 d8 _ VAN+PFOS
## 38      R43  VAN+PFOS pfos_serum8_mg 0.5540000  d8 d8 _ VAN+PFOS
## 40      R44  VAN+PFOS pfos_serum8_mg 0.7520000  d8 d8 _ VAN+PFOS
## 42      R45  VAN+PFOS pfos_serum8_mg 0.4590000  d8 d8 _ VAN+PFOS
## 44      R46  VAN+PFOS pfos_serum8_mg 0.6440000  d8 d8 _ VAN+PFOS
## 46      R47  VAN+PFOS pfos_serum8_mg 1.0890000  d8 d8 _ VAN+PFOS
## 48      R48  VAN+PFOS pfos_serum8_mg 0.3980000  d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "mg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   ID            variable     n  mean    sd
##   <chr>         <fct>    <dbl> <dbl> <dbl>
## 1 d4 _ PFOS     mg          12 0.165 0.031
## 2 d4 _ VAN+PFOS mg          10 0.183 0.034
## 3 d8 _ PFOS     mg          12 0.717 0.32 
## 4 d8 _ VAN+PFOS mg          12 0.624 0.201

5.3.3.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = COL)
bxp

5.3.3.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## [1] ID         rat_name   treatment  data_group mg         day        is.outlier
## [8] is.extreme
## <0 rækker> (eller 0-længde row.names)

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.896 0.000640

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic         p
##   <int> <int>     <dbl>     <dbl>
## 1     3    42      9.67 0.0000564
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.3.3.4 Kruskal-Wallis test

5.3.3.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.       n statistic    df           p method        
## * <chr> <int>     <dbl> <int>       <dbl> <chr>         
## 1 mg       46      34.1     3 0.000000187 Kruskal-Wallis
5.3.3.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 mg       46   0.741 eta2[H] large
5.3.3.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.   group1        group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr> <chr>         <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 mg    d4 _ PFOS     d4 _ V…    12    10     0.574 5.66e-1 6.79e-1 ns          
## 2 mg    d4 _ PFOS     d8 _ P…    12    12     4.62  3.92e-6 2.35e-5 ****        
## 3 mg    d4 _ PFOS     d8 _ V…    12    12     4.33  1.51e-5 4.54e-5 ****        
## 4 mg    d4 _ VAN+PFOS d8 _ P…    10    12     3.83  1.30e-4 2.60e-4 ***         
## 5 mg    d4 _ VAN+PFOS d8 _ V…    10    12     3.55  3.84e-4 5.75e-4 ***         
## 6 mg    d8 _ PFOS     d8 _ V…    12    12    -0.289 7.73e-1 7.73e-1 ns

5.3.3.5 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
  scale_y_continuous(name = "mg",limits = c(0,1.75),breaks = seq(0,1.75,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8")) +
  theme(axis.title.x = element_blank())

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(1.635,1.75,1.4,1.52))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS amount in mg at day 4 and 8
PFOS amount in mg at day 4 and 8

5.4 Liver day 8

This section will prepare to perform the data analysis for PFOS data from liver on day 8.

5.4.1 ug/g in liver tissue

5.4.1.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_ugg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_ugg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable           n    mean     sd
##   <chr>     <fct>          <dbl>   <dbl>  <dbl>
## 1 CTRL      pfos_liver_ugg    12   0.199  0.173
## 2 PFOS      pfos_liver_ugg    12 176.    21.7  
## 3 VAN       pfos_liver_ugg    12   0.205  0.22 
## 4 VAN+PFOS  pfos_liver_ugg    12 196.    18.5

5.4.1.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R11            11 no    no     271.  278.  287.  294.  293.   299
## 2 VAN       R16            28 no    yes    256.  260   268.  275.  273    279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers: sample from rat_name R01 and R30.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.877 0.000119

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic          p
##   <int> <int>     <dbl>      <dbl>
## 1     3    44      12.2 0.00000631
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.4.1.3 Kruskal-Wallis test

5.4.1.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.                n statistic    df            p method        
## * <chr>          <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_liver_ugg    48      36.4     3 0.0000000608 Kruskal-Wallis
5.4.1.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.                n effsize method  magnitude
## * <chr>          <int>   <dbl> <chr>   <ord>    
## 1 pfos_liver_ugg    48   0.760 eta2[H] large
5.4.1.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_liver_u… CTRL   PFOS      12    12     3.62  2.98e-4 4.47e-4 ***         
## 2 pfos_liver_u… CTRL   VAN       12    12    -0.102 9.19e-1 9.19e-1 ns          
## 3 pfos_liver_u… CTRL   VAN+P…    12    12     4.68  2.85e-6 8.54e-6 ****        
## 4 pfos_liver_u… PFOS   VAN       12    12    -3.72  2.00e-4 4.00e-4 ***         
## 5 pfos_liver_u… PFOS   VAN+P…    12    12     1.06  2.87e-1 3.44e-1 ns          
## 6 pfos_liver_u… VAN    VAN+P…    12    12     4.78  1.72e-6 8.54e-6 ****

5.4.1.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "ug/g",limits = c(0,270),breaks = seq(0,270,0.5)) +
  scale_y_break(breaks = c(1,140), scales = 3, ticklabels = c(150,200,250), space = 0.3) +
  theme(axis.title.x = element_blank(),
        axis.line.y.right = element_blank(),
        axis.text.y.right = element_blank(),
        axis.ticks.y.right = element_blank()) +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(235,265,250,235))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
p2

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 107, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 107, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 110, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

PFOS conc. in ug/g at day 8 ### mg in liver tissue in all groups #### Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_mg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable          n  mean    sd
##   <chr>     <fct>         <dbl> <dbl> <dbl>
## 1 CTRL      pfos_liver_mg    12 0.002 0.002
## 2 PFOS      pfos_liver_mg    12 2.11  0.305
## 3 VAN       pfos_liver_mg    12 0.002 0.002
## 4 VAN+PFOS  pfos_liver_mg    12 2.22  0.341

5.4.1.5 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R16            28 no    yes    256.   260  268.  275.   273   279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains one not extreme outliers (R16).

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic   p.value
##   <chr>                <dbl>     <dbl>
## 1 residuals(model)     0.861 0.0000433

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic           p
##   <int> <int>     <dbl>       <dbl>
## 1     3    44      16.5 0.000000257
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that mg PFOS in liver in all groups has one outlier, unequal variance, and falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

5.4.1.6 Kruskal-Wallis test

5.4.1.6.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.               n statistic    df            p method        
## * <chr>         <int>     <dbl> <int>        <dbl> <chr>         
## 1 pfos_liver_mg    48      35.6     3 0.0000000927 Kruskal-Wallis
5.4.1.6.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.               n effsize method  magnitude
## * <chr>         <int>   <dbl> <chr>   <ord>    
## 1 pfos_liver_mg    48   0.740 eta2[H] large
5.4.1.6.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 pfos_liver_mg CTRL   PFOS      12    12     3.86  1.12e-4 1.67e-4 ***         
## 2 pfos_liver_mg CTRL   VAN       12    12    -0.146 8.84e-1 8.84e-1 ns          
## 3 pfos_liver_mg CTRL   VAN+P…    12    12     4.39  1.14e-5 3.42e-5 ****        
## 4 pfos_liver_mg PFOS   VAN       12    12    -4.01  6.08e-5 1.22e-4 ***         
## 5 pfos_liver_mg PFOS   VAN+P…    12    12     0.525 6.00e-1 7.20e-1 ns          
## 6 pfos_liver_mg VAN    VAN+P…    12    12     4.53  5.77e-6 3.42e-5 ****

5.4.1.7 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,3.5),breaks = seq(0,3.5,0.01)) +
  scale_y_break(breaks = c(0.01,1), scales = 3, ticklabels = c(1.0,2.0,3.0), space = 0.3) +
  labs(fill = "Treatment") +
  theme(axis.title.x = element_blank(),
        axis.line.y.right = element_blank(),
        axis.text.y.right = element_blank(),
        axis.ticks.y.right = element_blank()) +
  scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(3.2,3,3.4,3.2))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS conc. in ug/g at day 8
PFOS conc. in ug/g at day 8

5.4.2 Total mg in liver

5.4.2.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_mg))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.4.2.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
##  [1] treatment         rat_name          ordering          pfos             
##  [5] van               bw_0              bw_1              bw_2             
##  [9] bw_3              bw_4              bw_5              bw_6             
## [13] bw_7              bw_8              bw_gain           cecum_wt         
## [17] cecum_wt_bw       cecum_norm        liver_wt          liver_wt_bw      
## [21] liver_norm        tot_pfos4         blood_vol4_mL     pfos_serum4_ugml 
## [25] pfos_serum4_ug    pfos_serum4_mg    pfos_serum4_pct   tot_pfos8        
## [29] blood_vol8_mL     pfos_serum8_ugml  pfos_serum8_ug    pfos_serum8_mg   
## [33] pfos_serum8_pct   pfos_change48_pct pfos_liver_ugg    pfos_liver_mg    
## [37] pfos_liver_pct    acetic            formic            propanoic        
## [41] m2_propanoic      butanoic          m3_butanoic       pentanoic        
## [45] m4_pentanoic      hexanoic          heptanoic         is.outlier       
## [49] is.extreme       
## <0 rækker> (eller 0-længde row.names)

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable      statistic     p
##   <chr>     <chr>             <dbl> <dbl>
## 1 PFOS      pfos_liver_mg     0.944 0.546
## 2 VAN+PFOS  pfos_liver_mg     0.955 0.711
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22     0.142 0.710
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.

5.4.2.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1   -0.115      2.11      2.22 pfos_l… PFOS   VAN+P…    12    12    -0.870 0.394
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.           group1 group2   effsize    n1    n2 magnitude
## * <chr>         <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_liver_mg PFOS   VAN+PFOS  -0.355    12    12 small

5.4.2.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mg PFOS",limits = c(0,3),breaks = seq(0,3,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(3))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
Total mg PFOS in total liver
Total mg PFOS in total liver

5.4.3 Pct.

5.4.3.1 Prepare data

This section sets the variables to be used and prepares the data if necessary.

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_pct"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")

# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_pct))

# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

5.4.3.2 Assumptions and preliminary tests

The two-samples t-tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R26            14 yes   no     238.  246.  248   257   259.   265
## 2 PFOS      R27            15 yes   no     270.  280.  284.  291.  290.   296
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.

Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05. You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.

# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable       statistic     p
##   <chr>     <chr>              <dbl> <dbl>
## 1 PFOS      pfos_liver_pct     0.985 0.997
## 2 VAN+PFOS  pfos_liver_pct     0.937 0.456
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.

If the data does not follow the normal distribution run a Wilcoxon Rank-sum test

Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are equal, the p-value should be greater than 0.05.

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22   0.00119 0.973
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

If the p-value of the Levene’s test is significant, it suggests that there is a significant difference between the variances of the two groups. In such case we should use Welch t-test, which doesn’t assume the equality of the two variances (var.equal=FALSE). If the Levene’s test is non-significant we can perform a Student t-test (var.equal=TRUE).

Two outliers were identified but analysis does not differ in result when excluded. Data is normally distributed and has equal variance. Hence we use t-test.

5.4.3.3 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -2.30      35.0      37.3 pfos_l… PFOS   VAN+P…    12    12     -1.46 0.159
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

The output provides:

  • .y.: the y variable used in the test.

  • group1,group2: the compared groups in the pairwise tests.

  • statistic: Test statistic used to compute the p-value.

  • df: degrees of freedom.

  • p: p-value.

  • p.adj: the adjusted p-value.

  • method: the statistical test used to compare groups.

  • p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.

  • estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.

  • estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.

  • alternative: a character string describing the alternative hypothesis.

  • conf.low,conf.high: Lower and upper bound on a confidence interval.

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.            group1 group2   effsize    n1    n2 magnitude
## * <chr>          <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 pfos_liver_pct PFOS   VAN+PFOS  -0.595    12    12 moderate

5.4.3.4 Create figure

# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)

# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "% of total dosed PFOS", limits = c(25,45),breaks = seq(25,45,5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")

p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(45))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2

# Plot for saving without legend
p3 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS serum day 8 in pct. of total
PFOS serum day 8 in pct. of total

5.5 Total PFOS detected on day 8

This section will prepare to perform the data analysis for total PFOS on day 8.

5.5.1 Analysis and Barplot

5.5.1.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","tot_pfos8","pfos_serum8_mg","pfos_liver_mg")]

# Calculate ratios
for (i in tmp$rat_name) {
  tmp$total_measured <- tmp$pfos_liver_mg + tmp$pfos_serum8_mg
  }
for (i in tmp$rat_name) {
  tmp$leftover <- tmp$tot_pfos8 - tmp$total_measured
  }

# Calculate percentage of detected PFOS in rats
for (i in tmp$rat_name) {
  tmp$pct_det <- tmp$total_measured / tmp$tot_pfos8 * 100
  tmp$pct_liver <- tmp$pfos_liver_mg / tmp$tot_pfos8 * 100
  tmp$pct_serum <- tmp$pfos_serum8_mg / tmp$tot_pfos8 * 100
}

print("Group: PFOS")
## [1] "Group: PFOS"
tmp %>% subset(treatment == "PFOS") %>% select(pct_det) %>% summary()
##     pct_det     
##  Min.   :37.73  
##  1st Qu.:42.16  
##  Median :44.60  
##  Mean   :46.83  
##  3rd Qu.:51.56  
##  Max.   :57.59
tmp %>% subset(treatment == "PFOS") %>% select(pct_liver) %>% summary()
##    pct_liver    
##  Min.   :27.05  
##  1st Qu.:33.19  
##  Median :34.93  
##  Mean   :35.04  
##  3rd Qu.:36.71  
##  Max.   :42.56
tmp %>% subset(treatment == "PFOS") %>% select(pct_serum) %>% summary()
##    pct_serum     
##  Min.   : 7.091  
##  1st Qu.: 7.673  
##  Median : 9.778  
##  Mean   :11.789  
##  3rd Qu.:14.862  
##  Max.   :22.244
print("Group: VAN+PFOS")
## [1] "Group: VAN+PFOS"
tmp %>% subset(treatment == "VAN+PFOS") %>% select(pct_det) %>% summary()
##     pct_det     
##  Min.   :41.29  
##  1st Qu.:45.41  
##  Median :46.69  
##  Mean   :47.88  
##  3rd Qu.:49.49  
##  Max.   :60.94
tmp %>% subset(treatment == "VAN+PFOS") %>% select(pct_liver) %>% summary()
##    pct_liver    
##  Min.   :32.68  
##  1st Qu.:34.15  
##  Median :37.31  
##  Mean   :37.34  
##  3rd Qu.:39.29  
##  Max.   :44.04
tmp %>% subset(treatment == "VAN+PFOS") %>% select(pct_serum) %>% summary()
##    pct_serum     
##  Min.   : 7.138  
##  1st Qu.: 8.186  
##  Median : 9.289  
##  Mean   :10.540  
##  3rd Qu.:11.851  
##  Max.   :19.380
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pct_det"
SUBJECT <- "rat_name"

# Subset to a specific varible
dat.clean <- tmp

# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 13
##   treatment rat_name ordering tot_pfos8 pfos_serum8_mg pfos_liver_mg
##   <chr>     <chr>       <int>     <dbl>          <dbl>         <dbl>
## 1 VAN+PFOS  R47            47      5.62           1.09          2.34
## # ℹ 7 more variables: total_measured <dbl>, leftover <dbl>, pct_det <dbl>,
## #   pct_liver <dbl>, pct_serum <dbl>, is.outlier <lgl>, is.extreme <lgl>
# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable statistic      p
##   <chr>     <chr>        <dbl>  <dbl>
## 1 PFOS      pct_det      0.945 0.570 
## 2 VAN+PFOS  pct_det      0.879 0.0853
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22     0.946 0.341
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

Data contain one not extreme outlier, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.

5.5.1.2 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -1.05      46.8      47.9 pct_det PFOS   VAN+P…    12    12    -0.447 0.659
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.

5.5.1.3 Create barplots

# Prepare data columns for barplot
tmp2 <- rbind(data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$leftover, "type" = "Unaccounted"),
              data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_serum8_mg, "type" = "PFOS serum"),
              data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_liver_mg, "type" = "PFOS liver"))

# Create plot per rat
p <- ggplot(tmp2, aes(x = rat_name, y = mg, fill = fct_rev(type))) +
  geom_bar(position = "fill", stat = "identity") +
  theme_pubr(legend = "top") +
  facet_grid(~ treatment, scales = "free_x") +
  labs(fill = "Sample type", x = "Rats", y = "% of total dosed") +
  scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FECE00", "PFOS serum" = "#cf200D")) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)

# Create plot for average per treatment group
p <- ggplot(tmp2, aes(x = treatment, y = mg, fill = fct_rev(type))) +
  geom_bar(position = "fill", stat = "identity") +
  theme_pubr(legend = "top") +
  labs(fill = "Sample type", x = "Treatment",y = "% of total dosed") +
  scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FECE00", "PFOS serum" = "#cf200D")) +
  theme(axis.ticks.x=element_blank()) +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

5.6 Liver-to-serum (µg/g / ug/g) ratio

5.6.1 Analysis and boxplot

5.6.1.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","pfos_serum8_ugml","pfos_liver_ugg")]

# Calculate ratios
for (i in tmp$rat_name) {
  tmp$ls_ratio <- tmp$pfos_liver_ugg / tmp$pfos_serum8_ugml
  }

print("Group: PFOS")
## [1] "Group: PFOS"
tmp %>% subset(treatment == "PFOS") %>% select(ls_ratio) %>% summary()
##     ls_ratio    
##  Min.   :2.088  
##  1st Qu.:4.267  
##  Median :5.539  
##  Mean   :5.564  
##  3rd Qu.:6.958  
##  Max.   :8.401
print("Group: VAN+PFOS")
## [1] "Group: VAN+PFOS"
tmp %>% subset(treatment == "VAN+PFOS") %>% select(ls_ratio) %>% summary()
##     ls_ratio     
##  Min.   : 3.427  
##  1st Qu.: 5.279  
##  Median : 6.466  
##  Mean   : 6.618  
##  3rd Qu.: 7.949  
##  Max.   :10.162
tmp %>% group_by(across(all_of("treatment"))) %>% get_summary_stats(!!sym("ls_ratio"), type = "mean_sd")
## # A tibble: 2 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 PFOS      ls_ratio    12  5.56  1.91
## 2 VAN+PFOS  ls_ratio    12  6.62  1.95
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "ls_ratio"
SUBJECT <- "rat_name"

# Subset to a specific variable
dat.clean <- tmp

# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE

# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))

# Sort data for paired test
if (PAIRED) {
  # Order data
  dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
  
  # Remove unpaired samples
  dat.clean <- dat.clean %>% 
    group_by(!!sym(SUBJECT)) %>%
    filter(n() != 1) %>%
    arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
    droplevels() %>% 
    ungroup()
}

# identify outliers
dat.clean %>%
  group_by(!!sym(PREDICTOR)) %>%
  identify_outliers(!!sym(OUTCOME))
## [1] treatment        rat_name         ordering         pfos_serum8_ugml
## [5] pfos_liver_ugg   ls_ratio         is.outlier       is.extreme      
## <0 rækker> (eller 0-længde row.names)
# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable statistic     p
##   <chr>     <chr>        <dbl> <dbl>
## 1 PFOS      ls_ratio     0.972 0.932
## 2 VAN+PFOS  ls_ratio     0.973 0.940
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

# Run test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22    0.0217 0.884
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

Data contains no outliers, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.

5.6.1.2 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
##      <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1    -1.05      5.56      6.62 ls_rat… PFOS   VAN+P…    12    12     -1.34 0.195
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.

5.6.1.3 Create boxplot

p <- ggboxplot(dat.clean, x = "treatment", y = "ls_ratio",
               fill = "treatment",
               add = "jitter",
               add.params = list(size = 1)) +
  theme_pubr(legend = "top") +
  scale_fill_manual(values = params$COL) +
  theme(axis.title.x = element_blank()) +
  labs(fill = "Treatment", y = "Liver-to-serum PFOS ratio") +
  stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(11))
p

p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_boxplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_boxplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 64, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

6 PFOS ISOMER ANALYSIS

In this section we investigate linear (l-PFOS) and branched PFOS (br-PFOS) by a ratio of the two based on calculated peak area expressed as “bl-ratio”. These data are obtained from quantitative mass spectrometry analysis on retention-time peaks for br-PFOS and l-PFOS, respectively.

Factors that will be investigated are each sample type (material: “Serum”, “Liver”), day of measurement (only applies to serum; day: “d4”, “d8”), treatment groups and level of bl-ratio between samples and solvent controls spiked with same batch of PFOS used for oral gavage (treatment: “PFOS”, “VAN+PFOS”, “Control” = where control are spiked controls; type: “Sample”,“Control”).

6.1 Import data

Data is imported from CSV format and bl-ratios are calculated by br-PFOS / l-PFOS.

# Load analysis data
dat <- read.csv("input/pfos_isomer_data.csv", header = TRUE, sep = ";", dec = ",")

# Calculate branched-linear PFOS ratio based off AMT
for (i in dat$id) {
  dat$bl_ratio <- dat$area_branch / dat$area_linear
}
# Create common predictor for later analysis
for (i in dat$id) {
  dat$mat_treat <- paste0(dat$material,"_",dat$treatment)
}

save(dat, file = "R_objects/pfos_isomer_data.Rdata")

6.2 Prepare Serum data

Investigation of bl-ratio in treatment groups in serum samples on Day 4 and Day 8.

#  Load data
load("R_objects/pfos_isomer_data.Rdata")

# Subset
dat.clean <- subset(dat, material == "Serum" & !treatment == "Control")

# Set names of variables
PREDICTOR <- c("day","treatment")#c("treatment","pfos","van")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 6
##   day   treatment variable     n  mean    sd
##   <chr> <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 d4    PFOS      bl_ratio    12 0.22  0.018
## 2 d4    VAN+PFOS  bl_ratio    10 0.241 0.021
## 3 d8    PFOS      bl_ratio    12 0.222 0.015
## 4 d8    VAN+PFOS  bl_ratio    12 0.211 0.012
# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 22
##   day   treatment id           material order type   rat_name is_area rt_branch
##   <chr> <chr>     <chr>        <chr>    <chr> <chr>  <chr>      <int>     <dbl>
## 1 d8    PFOS      Serum_R31_d8 Serum    A     Sample R31        25554      10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## #   rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## #   area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## #   is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.975   0.435
# Check the homogeneity of variances with Levene's test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    42      1.06 0.376
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

This shows that data has one outlier, is normally distribution and has equal variance. Therefore we can test the data with a one-way ANOVA test with Tukey’s honest significance test.

6.2.1 ANOVA One-Way test

6.2.1.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##          Effect DFn DFd      F     p p<.05   ges
## 1           day   1  42  6.954 0.012     * 0.142
## 2     treatment   1  42  0.792 0.379       0.019
## 3 day:treatment   1  42 10.562 0.002     * 0.201

6.2.1.2 Perform posthoc test

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 8 × 9
##   term          group1     group2 null.value estimate conf.low conf.high   p.adj
## * <chr>         <chr>      <chr>       <dbl>    <dbl>    <dbl>     <dbl>   <dbl>
## 1 day           d4         d8              0 -0.0128  -2.27e-2  -0.00286 1.28e-2
## 2 treatment     PFOS       VAN+P…          0  0.00437 -5.55e-3   0.0143  3.79e-1
## 3 day:treatment d4:PFOS    d8:PF…          0  0.00227 -1.59e-2   0.0204  9.87e-1
## 4 day:treatment d4:PFOS    d4:VA…          0  0.0211   2.08e-3   0.0402  2.46e-2
## 5 day:treatment d4:PFOS    d8:VA…          0 -0.00859 -2.68e-2   0.00958 5.9 e-1
## 6 day:treatment d8:PFOS    d4:VA…          0  0.0189  -1.95e-4   0.0379  5.33e-2
## 7 day:treatment d8:PFOS    d8:VA…          0 -0.0109  -2.90e-2   0.00731 3.9 e-1
## 8 day:treatment d4:VAN+PF… d8:VA…          0 -0.0297  -4.88e-2  -0.0107  8.26e-4
## # ℹ 1 more variable: p.adj.signif <chr>

Significant impact is observed between days in the VAN+PFOS groups and between treatment groups on Day 4. We will plot this as a nested analysis with pairwise t-tests on the inner variable and the outer variable.

6.2.1.3 Create figure

## Pairwise comparison for inner variable: day
stat.in <- dat.clean %>%
  group_by(treatment) %>%
  t_test(bl_ratio ~ day, 
         paired = FALSE, var.equal = EQUAL.VAR, 
         detailed = TRUE, alternative = "two.sided") %>%
  add_significance() %>%
  p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.in
## # A tibble: 2 × 18
##   treatment estimate estimate1 estimate2 .y.      group1 group2    n1    n2
## * <chr>        <dbl>     <dbl>     <dbl> <chr>    <chr>  <chr>  <int> <int>
## 1 PFOS      -0.00227     0.220     0.222 bl_ratio d4     d8        12    12
## 2 VAN+PFOS   0.0297      0.241     0.211 bl_ratio d4     d8        10    12
## # ℹ 9 more variables: statistic <dbl>, p <dbl>, df <dbl>, conf.low <dbl>,
## #   conf.high <dbl>, method <chr>, alternative <chr>, p.signif <chr>,
## #   p.format <chr>
## Pairwise comparison for outer variable: treatment
stat.out <- dat.clean %>%
  t_test(bl_ratio ~ treatment,
         paired = FALSE, var.equal = EQUAL.VAR,
         detailed = TRUE, alternative = "two.sided") %>%
  add_significance() %>%
  p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.out
## # A tibble: 1 × 17
##   estimate estimate1 estimate2 .y.     group1 group2    n1    n2 statistic     p
## *    <dbl>     <dbl>     <dbl> <chr>   <chr>  <chr>  <int> <int>     <dbl> <dbl>
## 1 -0.00379     0.221     0.225 bl_rat… PFOS   VAN+P…    24    22    -0.663 0.511
## # ℹ 7 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>, p.format <chr>
## Calculate positions statistics on plot
stat.in <- stat.in %>% add_xy_position(x = "treatment", dodge = 0.8)
stat.out <- stat.out %>% add_xy_position(x = "treatment")
stat.out$y.position <- max(stat.in$y.position)*1.03

# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
               fill = "day",
               color = "day",
               add = "jitter",
               add.params = list(size = 1)) +
  theme_pubr(legend = "top") +
  scale_color_manual(values = c("d4" = "black","d8" = "black")) +
  scale_fill_manual(values = c("#ffffff","#aaaaaa"), name = "Day", labels = c("4","8")) +
  scale_y_continuous(name = "Serum B/L ratio", limits = c(0.15,0.3), breaks = seq(0.15,0.3,0.05)) +
  theme(axis.title.x = element_blank()) +
  guides(color = "none")

p.stat <- p + stat_pvalue_manual(stat.in, tip.length = 0, hide.ns = FALSE) +
  stat_pvalue_manual(stat.out, tip.length = 0, hide.ns = FALSE)
p.stat

suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_serum.pdf", plot = p.stat, device = "pdf", dpi = 300, units = "mm", height = 100, width = 100))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_serum.png", plot = p.stat, device = "png", dpi = 300, units = "mm", height = 100, width = 100))

6.3 Prepare Liver data

Investigation of bl-ratio in treatment groups in liver samples. These are only tested for Day 8, as being the only sampling day for liver.

#  Load data
load("R_objects/pfos_isomer_data.Rdata")

# Subset
dat.clean <- subset(dat, material == "Liver" & !treatment == "Control")

# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 2 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 PFOS      bl_ratio    12 0.172 0.013
## 2 VAN+PFOS  bl_ratio    12 0.18  0.009
# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 22
##   treatment id           material order type   day   rat_name is_area rt_branch
##   <chr>     <chr>        <chr>    <chr> <chr>  <chr> <chr>      <int>     <dbl>
## 1 PFOS      Liver_R29_d8 Liver    B     Sample d8    R29        19478      10.3
## 2 PFOS      Liver_R36_d8 Liver    B     Sample d8    R36        18635      10.3
## 3 VAN+PFOS  Liver_R41_d8 Liver    B     Sample d8    R41        19421      10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## #   rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## #   area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## #   is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Run Shapiro test
dat.clean %>% 
  group_by(!!sym(PREDICTOR)) %>%
  shapiro_test(!!sym(OUTCOME))
## # A tibble: 2 × 4
##   treatment variable statistic     p
##   <chr>     <chr>        <dbl> <dbl>
## 1 PFOS      bl_ratio     0.923 0.312
## 2 VAN+PFOS  bl_ratio     0.901 0.165
# Create QQplot
ggqqplot(dat.clean, x = OUTCOME, facet.by = PREDICTOR)

# Check the homogeneity of variances with Levene's test
# Run test
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    22     0.675 0.420
# Save output
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Two outliers were identified. Data is normally distributed and has equal variance. Hence we use two-tailed t-test.

6.3.1 PERFORM TEST

T-test
We are now ready to perform the test

stat.test <- dat.clean %>% 
  t_test(FORMULA,
         var.equal = EQUAL.VAR,
         detailed = TRUE,
         paired = FALSE,
         alternative = "two.sided") %>%
  add_significance()
stat.test
## # A tibble: 1 × 16
##   estimate estimate1 estimate2 .y.    group1 group2    n1    n2 statistic      p
##      <dbl>     <dbl>     <dbl> <chr>  <chr>  <chr>  <int> <int>     <dbl>  <dbl>
## 1 -0.00847     0.172     0.180 bl_ra… PFOS   VAN+P…    12    12     -1.82 0.0817
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, p.signif <chr>

Effect size
The effect size is calculated as Cohen’s D

dat.clean %>% cohens_d(FORMULA, 
                       var.equal = EQUAL.VAR,
                       paired = FALSE)
## # A tibble: 1 × 7
##   .y.      group1 group2   effsize    n1    n2 magnitude
## * <chr>    <chr>  <chr>      <dbl> <int> <int> <ord>    
## 1 bl_ratio PFOS   VAN+PFOS  -0.745    12    12 moderate

No significance is observed between treatment groups in the liver samples. We present this in a plot using the above statistics.

6.3.2 Create figure

# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
               fill = "treatment",
               add = "jitter",
               add.params = list(size = 1)) +
  theme_pubr(legend = "top") +
  scale_fill_manual(values = params$COL, name = "Treatment") +
  scale_y_continuous(name = "Liver B/L ratio", limits = c(0.15,0.3), breaks = seq(0.15,0.3,0.05)) +
  theme(axis.title.x = element_blank()) +
  stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.22))
## Error in is_missing(values): objekt 'params' blev ikke fundet
p

suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_liver.pdf", plot = p, device = "pdf", dpi = 300, units = "mm", height = 100, width = 60))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_liver.png", plot = p, device = "png", dpi = 300, units = "mm", height = 100, width = 60))

6.4 Prepare test of “material” day 8

Here we aim to test differences in bl-ratio between liver and serum on Day 8. We exclude Day 4 as there are no equivalent liver data to compare to and we know from previous that Day 4 serum has slightly higher bl-ratio than Day 8 and both being higher than Liver, making any statistical significant difference between materials apply to Day 4 Serum samples as well. Included in the samples presented here are spiked negative controls and solvent controls which all have had the same batch of PFOS added directly to the same before analysis. These controls reflect the batch proportion of l-PFOS to br-PFOS.

load("R_objects/pfos_isomer_data.Rdata")

# Subset
dat.clean <- subset(dat, day == "d8") #!treatment == "Control" &

# Set names of variables
PREDICTOR <- c("material","treatment")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 6 × 6
##   material treatment variable     n  mean    sd
##   <chr>    <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 Liver    Control   bl_ratio    12 0.075 0.009
## 2 Liver    PFOS      bl_ratio    12 0.172 0.013
## 3 Liver    VAN+PFOS  bl_ratio    12 0.18  0.009
## 4 Serum    Control   bl_ratio     4 0.067 0.014
## 5 Serum    PFOS      bl_ratio    12 0.222 0.015
## 6 Serum    VAN+PFOS  bl_ratio    12 0.211 0.012
# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 4 × 22
##   material treatment id           order type   day   rat_name is_area rt_branch
##   <chr>    <chr>     <chr>        <chr> <chr>  <chr> <chr>      <int>     <dbl>
## 1 Liver    PFOS      Liver_R29_d8 B     Sample d8    R29        19478      10.3
## 2 Liver    PFOS      Liver_R36_d8 B     Sample d8    R36        18635      10.3
## 3 Liver    VAN+PFOS  Liver_R41_d8 B     Sample d8    R41        19421      10.3
## 4 Serum    PFOS      Serum_R31_d8 A     Sample d8    R31        25554      10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## #   rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## #   area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## #   is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.970   0.117
# Check the homogeneity of variances with Levene's test
dat.clean %>% levene_test(FORMULA)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     5    58     0.433 0.824
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05

This shows that data has four none-critical outliers, is normally distribution and has equal variance. Therefore we can test the data with a one-way ANOVA test with Tukey’s honest significance test.

6.4.1 ANOVA One-Way test

6.4.1.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##               Effect DFn DFd       F        p p<.05   ges
## 1           material   1  58  97.473 4.98e-14     * 0.627
## 2          treatment   2  58 520.433 8.95e-38     * 0.947
## 3 material:treatment   2  58  23.081 4.22e-08     * 0.443

6.4.1.2 Perform posthoc test

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 19 × 9
##    term            group1 group2 null.value estimate conf.low conf.high    p.adj
##  * <chr>           <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
##  1 material        Liver  Serum           0  0.0530   0.0469    0.0591  1.49e-11
##  2 treatment       Contr… PFOS            0  0.111    0.101     0.120   1.49e-11
##  3 treatment       Contr… VAN+P…          0  0.110    0.100     0.119   1.49e-11
##  4 treatment       PFOS   VAN+P…          0 -0.00120 -0.00962   0.00723 9.38e- 1
##  5 material:treat… Liver… Serum…          0 -0.00797 -0.0286    0.0127  8.63e- 1
##  6 material:treat… Liver… Liver…          0  0.0970   0.0824    0.112   1.49e-11
##  7 material:treat… Liver… Serum…          0  0.147    0.133     0.162   1.49e-11
##  8 material:treat… Liver… Liver…          0  0.105    0.0909    0.120   1.49e-11
##  9 material:treat… Liver… Serum…          0  0.136    0.122     0.151   1.49e-11
## 10 material:treat… Serum… Liver…          0  0.105    0.0843    0.126   1.49e-11
## 11 material:treat… Serum… Serum…          0  0.155    0.135     0.176   1.49e-11
## 12 material:treat… Serum… Liver…          0  0.113    0.0928    0.134   1.49e-11
## 13 material:treat… Serum… Serum…          0  0.144    0.124     0.165   1.49e-11
## 14 material:treat… Liver… Serum…          0  0.0503   0.0357    0.0649  1.52e-11
## 15 material:treat… Liver… Liver…          0  0.00847 -0.00613   0.0231  5.31e- 1
## 16 material:treat… Liver… Serum…          0  0.0395   0.0249    0.0541  1.05e- 9
## 17 material:treat… Serum… Liver…          0 -0.0419  -0.0565   -0.0273  1.77e-10
## 18 material:treat… Serum… Serum…          0 -0.0109  -0.0255    0.00374 2.57e- 1
## 19 material:treat… Liver… Serum…          0  0.0310   0.0164    0.0456  7.53e- 7
## # ℹ 1 more variable: p.adj.signif <chr>

Significant impact is observed overall between liver and serum, as well as between spiked controls and both treatment groups for each material. No significance is observed only between controls run with each material group and overall between treatment groups within serum and liver, respectively. We present this data as a nested plot with serum and liver grouping as inner and PFOS, VAN+PFOS and Control (treatment variable) as outer with accompanying t-test and anova with tukey’s.

6.4.2 Create figure

## Pairwise comparison for inner variable
stat.in <- dat.clean %>%
  group_by(treatment) %>%
  t_test(bl_ratio ~ order, 
         paired = FALSE, var.equal = EQUAL.VAR, 
         detailed = TRUE, alternative = "two.sided") %>%
  add_significance() %>%
  p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.in
## # A tibble: 3 × 18
##   treatment estimate estimate1 estimate2 .y.      group1 group2    n1    n2
## * <chr>        <dbl>     <dbl>     <dbl> <chr>    <chr>  <chr>  <int> <int>
## 1 Control   -0.00797    0.0668    0.0747 bl_ratio A      B          4    12
## 2 PFOS       0.0503     0.222     0.172  bl_ratio A      B         12    12
## 3 VAN+PFOS   0.0310     0.211     0.180  bl_ratio A      B         12    12
## # ℹ 9 more variables: statistic <dbl>, p <dbl>, df <dbl>, conf.low <dbl>,
## #   conf.high <dbl>, method <chr>, alternative <chr>, p.signif <chr>,
## #   p.format <chr>
## Pairwise comparison for outer variable
stat.out <- dat.clean %>%
  anova_test(bl_ratio ~ treatment) %>%
  add_significance() %>%
  p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.out
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd       F        p p<.05  ges p.signif p.format
## 1 treatment   2  61 188.131 8.13e-27     * 0.86     ****   <0.001
pwc2 <- dat.clean %>%
  tukey_hsd(bl_ratio ~ treatment) %>%
  add_significance() %>%
  p_format("p.adj", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
pwc2
## # A tibble: 3 × 10
##   term      group1  group2   null.value estimate conf.low conf.high    p.adj
## * <chr>     <chr>   <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
## 1 treatment Control PFOS              0  0.124     0.107     0.141  2.03e-11
## 2 treatment Control VAN+PFOS          0  0.123     0.106     0.140  2.03e-11
## 3 treatment PFOS    VAN+PFOS          0 -0.00120  -0.0165    0.0141 9.81e- 1
## # ℹ 2 more variables: p.adj.signif <chr>, p.adj.format <chr>
## Calculate positions statistics on plot
stat.in <- stat.in %>% add_xy_position(x = "treatment", dodge = 0.8)
pwc2 <- pwc2 %>% add_xy_position(x = "treatment")
pwc2$y.position <- max(stat.in$y.position)*1.1

# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
               fill = "order",
               color = "order",
               add = "jitter",
               add.params = list(size = 1)) +
  theme_pubr(legend = "top") +
  scale_color_manual(values = c("A" = "black","B" = "black")) +
  scale_fill_manual(values = c("B" = "#FECE00", "A" = "#cf200D"), name = "Sample type", labels = c("A" = "Serum", "B" = "Liver")) +
  scale_y_continuous(name = "B/L ratio", limits = c(0.05,0.3), breaks = seq(0.05,0.3,0.05)) +
  theme(axis.title.x = element_blank()) +
  guides(color = "none")

p.stat <- p + stat_pvalue_manual(stat.in, label = "p.signif", tip.length = 0, hide.ns = FALSE, y.position = c(0.11,0.25,0.25)) +
  stat_pvalue_manual(pwc2, label = "p.adj.signif", tip.length = 0, hide.ns = FALSE, y.position = c(0.27,0.30,0.285))
p.stat

suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_material.pdf", plot = p.stat, device = "pdf", dpi = 300, units = "mm", height = 100, width = 136))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_material.png", plot = p.stat, device = "png", dpi = 300, units = "mm", height = 100, width = 136))

7 SHORT CHAIN FATTY ACID DATA

Following section is handling data analysis of short chain fatty acids from colonic samples collected at day 8. Ten SCFAs are analysed by MS Omics A/S (Denmark), delivered as concentrations in millimolar (mM), and tested accordingly here.

Concentrations in mM were recorded from proximal colonic samples collected from all animals at dissection.

Following analyses conclude overall Principal Coordinate analysis with PERMANOVA and individual boxplots comparing compound concentrations between treatment groups.

7.1 PCOA AND PERMANOVA ANALYSIS OF SCFA

# Load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

dat.clean <- dat %>% select(rat_name, treatment, pfos, van,
  acetic, propanoic, m2_propanoic, butanoic, m3_butanoic, pentanoic, hexanoic,formic, m4_pentanoic, heptanoic) # formic, m4_pentanoic, heptanoic excluded due to low sample count

# Subset data with NA
dat.clean <- subset(dat.clean, !dat.clean$rat_name == "R08") # Mostly <LOD - suspects error in sample handling

# Create SCFA table
row.names(dat.clean) <- dat.clean$rat_name
dat.SCFA <- dat.clean %>% select(acetic, propanoic, m2_propanoic, butanoic, m3_butanoic, pentanoic, hexanoic)
dat.SCFA
##        acetic propanoic m2_propanoic   butanoic m3_butanoic  pentanoic
## R01  7.868665 0.3921931   0.10713169 1.71092008  0.06609145 0.05975663
## R02  7.529961 0.2648977   0.03172203 0.83718815  0.01943444 0.03931031
## R03  7.334644 0.3076036   0.04957959 1.55126305  0.02795564 0.07331600
## R04  7.211624 0.4002376   0.05054027 1.40421430  0.03441629 0.03030373
## R05  9.522715 0.2276087   0.04910177 2.42728491  0.02626586 0.04521286
## R06  6.277653 0.2297736   0.03902360 1.21216936  0.02140372 0.05487809
## R07  7.695140 0.2805200   0.07144139 1.92250223  0.04966964 0.06921387
## R09  7.035536 0.1942356   0.03438163 0.70002530  0.01989101 0.02446537
## R10 10.465522 0.3485516   0.06152928 1.33044438  0.04261478 0.06032153
## R11  8.556428 0.3814438   0.04937821 1.65193422  0.02931262 0.07490580
## R12  6.799043 0.1931678   0.04667801 0.64906530  0.02665150 0.02992233
## R25  9.315862 0.3044661   0.04532504 1.97734359  0.02870620 0.03622949
## R26 11.183570 0.3442234   0.08032019 2.17634348  0.05628301 0.10358275
## R27  3.818461 0.1384521   0.02690980 0.34833126  0.01873754 0.01690413
## R28  8.430942 0.4858121   0.07477591 1.27083868  0.05009847 0.06523660
## R29  4.116468 0.1978577   0.04098487 0.46145414  0.03295323 0.02816228
## R30  3.503725 0.1516586   0.03871071 0.25109881  0.02896771 0.03837706
## R31  8.719509 0.2917921   0.04451333 1.52335000  0.03458602 0.05070083
## R32  6.642488 0.2224302   0.03314928 0.75935657  0.01985540 0.02406519
## R33  9.218475 0.3206063   0.04025070 1.07517412  0.02147511 0.04536975
## R34  6.435524 0.2135714   0.04781338 1.00828074  0.02697322 0.04630859
## R35  8.880293 0.2698323   0.06744288 1.69473785  0.04507291 0.08184510
## R36  5.979113 0.2441023   0.06011517 1.17138908  0.03705746 0.04471346
## R13  3.051265 0.1587890   0.00000000 0.05988682  0.00000000 0.00000000
## R14  4.022243 0.2649193   0.00000000 0.16343697  0.00000000 0.00000000
## R15  2.060242 0.4008345   0.00000000 0.32230558  0.00000000 0.00000000
## R16  2.065274 0.3575574   0.00000000 0.05856690  0.00000000 0.00000000
## R17  3.685242 0.2450650   0.00000000 0.26577665  0.00000000 0.00000000
## R18  2.276606 0.4004846   0.00000000 0.15572863  0.00000000 0.00000000
## R19  2.775194 0.2418976   0.06122209 0.16288653  0.06118418 0.05317155
## R20  3.659261 0.2127580   0.00000000 0.74450130  0.00000000 0.00000000
## R21  3.008143 0.2394504   0.00000000 0.09711944  0.00000000 0.00000000
## R22  2.797761 0.1244424   0.00000000 0.07069875  0.00000000 0.00000000
## R23  3.818070 0.2752683   0.00000000 0.08078066  0.00000000 0.00000000
## R24  3.078666 0.2428717   0.00000000 0.06046572  0.00000000 0.00000000
## R37  2.540922 0.3583397   0.00000000 0.33848057  0.00000000 0.00000000
## R38  2.216871 0.3811121   0.00000000 0.17483102  0.00000000 0.00000000
## R39  1.937005 0.2666815   0.00000000 0.14572444  0.00000000 0.00000000
## R40  4.048858 0.1958996   0.00000000 0.07459486  0.00000000 0.00000000
## R41  3.343698 0.2100220   0.00000000 0.15205560  0.00000000 0.00000000
## R42  2.766247 0.2873487   0.09295739 0.26150544  0.07561294 0.08973810
## R43  3.201230 0.5728650   0.00000000 0.15850144  0.00000000 0.00000000
## R44  5.601203 0.3565805   0.00000000 0.09144850  0.00000000 0.00000000
## R45  3.817896 0.2534805   0.00000000 0.23164761  0.00000000 0.00000000
## R46  4.270382 0.3910062   0.00000000 0.19248275  0.00000000 0.00000000
## R47  3.335548 0.3513265   0.02224919 0.06549052  0.00000000 0.00000000
## R48  2.386966 0.2748988   0.00000000 0.43317059  0.00000000 0.00000000
##       hexanoic
## R01 0.08802237
## R02 0.01549625
## R03 0.10609609
## R04 0.00000000
## R05 0.06308093
## R06 0.04631728
## R07 0.06971790
## R09 0.04058588
## R10 0.06044937
## R11 0.12464955
## R12 0.01712496
## R25 0.08287127
## R26 0.18375741
## R27 0.01467142
## R28 0.04373499
## R29 0.02202622
## R30 0.03269296
## R31 0.02251866
## R32 0.01674318
## R33 0.14119839
## R34 0.05045684
## R35 0.14247081
## R36 0.08931266
## R13 0.00000000
## R14 0.00000000
## R15 0.00000000
## R16 0.00000000
## R17 0.00000000
## R18 0.00000000
## R19 0.04677862
## R20 0.00000000
## R21 0.00000000
## R22 0.00000000
## R23 0.00000000
## R24 0.00000000
## R37 0.00000000
## R38 0.00000000
## R39 0.00000000
## R40 0.00000000
## R41 0.00000000
## R42 0.07295131
## R43 0.00000000
## R44 0.00000000
## R45 0.00000000
## R46 0.00000000
## R47 0.00000000
## R48 0.00000000
# Change all zeros to NA
dat.clean[dat.clean == 0] <- NA

# Summary samples in groups
tb <- dat.clean %>% group_by(across(all_of("treatment"))) %>% get_summary_stats(type = "mean_sd")

7.1.1 DAtest (treatment)

# Test best method 
filt.test <- testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize = 10, relative = FALSE, k = c(1,1,2))
## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Dataset contains very few features
## Running on 7 cores
## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Very few features spiked. Increase 'k' or set 'R' to more than 50 to
## ensure proper estimation of AUC and FPR
## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Set to spike more than half of the dataset, which might give unreliable
## estimates, Change k argument
## predictor is assumed to be a categorical variable with 4 levels: CTRL, PFOS, VAN, VAN+PFOS
## Spikeing...
## Testing 7 methods 20 times each...
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
# Evaluate the plot and summary table
sum.fil <- summary(filt.test)
##                   Method AUC FPR FDR Power Score Score.5% Score.95%  
##              ANOVA (aov)   1   0   0   1.0  0.50     0.04       0.5 *
##          Log ANOVA (lao)   1   0   0   1.0  0.50     0.04       0.5 *
##              LIMMA (lim)   1   0   0   1.0  0.50    -0.38       0.5 *
##  Linear regression (lrm)   1   0   0   1.0  0.50    -0.38       0.5 *
##          Log LIMMA (lli)   1   0   0   1.0  0.50    -0.38       0.5 *
##    Log Linear reg. (llm)   1   0   0   1.0  0.50    -0.38       0.5 *
##     Kruskal-Wallis (kru)   1   0   0   0.5  0.25    -0.26       0.5 *
p.fil <- plot(filt.test)
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
##   Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun.min` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
##   Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun.max` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
##   Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p.fil

# Run DAtest
DA.ttt(t(dat.SCFA), dat.clean$van)
##                      pval     pval.adj      log2FC ordering      Feature
## acetic       1.556877e-02 1.816357e-02  0.06584986   yes>no       acetic
## propanoic    1.234740e-07 4.321591e-07  1.43164499   yes>no    propanoic
## m2_propanoic 9.330018e-03 1.306202e-02 -1.15449215   no>yes m2_propanoic
## butanoic     2.009708e-08 1.406795e-07 -1.27840244   no>yes     butanoic
## m3_butanoic  7.450974e-02 7.450974e-02 -0.86743747   no>yes  m3_butanoic
## pentanoic    5.798769e-03 1.014785e-02 -1.25935721   no>yes    pentanoic
## hexanoic     2.431332e-04 5.673108e-04 -1.65518822   no>yes     hexanoic
##                    Method
## acetic       t-test (ttt)
## propanoic    t-test (ttt)
## m2_propanoic t-test (ttt)
## butanoic     t-test (ttt)
## m3_butanoic  t-test (ttt)
## pentanoic    t-test (ttt)
## hexanoic     t-test (ttt)
DA.aov(t(dat.SCFA), dat.clean$treatment) # ANOVA
##                      pval     pval.adj      Feature      Method
## acetic       8.187692e-02 9.552307e-02       acetic ANOVA (aov)
## propanoic    2.239608e-07 1.567725e-06    propanoic ANOVA (aov)
## m2_propanoic 6.695521e-02 9.373730e-02 m2_propanoic ANOVA (aov)
## butanoic     4.750816e-07 1.662786e-06     butanoic ANOVA (aov)
## m3_butanoic  3.469938e-01 3.469938e-01  m3_butanoic ANOVA (aov)
## pentanoic    4.934198e-02 8.634847e-02    pentanoic ANOVA (aov)
## hexanoic     2.998526e-03 6.996560e-03     hexanoic ANOVA (aov)
DA.per(t(dat.SCFA), dat.clean$pfos) # permanova
##        Feature       pval        log2FC coverage ordering  pval.adj
## 1       acetic 0.09269073 -0.0010949396     1000   no>yes 0.1081392
## 2    propanoic 0.07239276  0.0051716716     1000   yes>no 0.1013499
## 3 m2_propanoic 0.05649435  0.0013391493     1000   yes>no 0.1013499
## 4     butanoic 0.07079292 -0.0075001412     1000   no>yes 0.1013499
## 5  m3_butanoic 0.06929307  0.0007623341     1000   yes>no 0.1013499
## 6    pentanoic 0.07049295  0.0009115844     1000   yes>no 0.1013499
## 7     hexanoic 0.44345565  0.0016915617    10000   yes>no 0.4434557
##              Method
## 1 Permutation (per)
## 2 Permutation (per)
## 3 Permutation (per)
## 4 Permutation (per)
## 5 Permutation (per)
## 6 Permutation (per)
## 7 Permutation (per)
DA.per(t(dat.SCFA), dat.clean$van) # permanova
##        Feature       pval       log2FC coverage ordering    pval.adj
## 1       acetic 0.01829817  0.030019941    10000   yes>no 0.021347865
## 2    propanoic 0.00010000  0.073564594    10000   yes>no 0.000350000
## 3 m2_propanoic 0.00509949 -0.005387152    10000   no>yes 0.007139286
## 4     butanoic 0.00010000 -0.102207484    10000   no>yes 0.000350000
## 5  m3_butanoic 0.07109289 -0.003099900    10000   no>yes 0.071092891
## 6    pentanoic 0.00219978 -0.005363531    10000   no>yes 0.003849615
## 7     hexanoic 0.00029997 -0.007433509    10000   no>yes 0.000699930
##              Method
## 1 Permutation (per)
## 2 Permutation (per)
## 3 Permutation (per)
## 4 Permutation (per)
## 5 Permutation (per)
## 6 Permutation (per)
## 7 Permutation (per)
DA.kru(t(dat.SCFA), dat.clean$treatment) # Kruskal-Wallis
##                      pval     pval.adj      Feature               Method
## acetic       6.138004e-02 6.138004e-02       acetic Kruskal-Wallis (kru)
## propanoic    2.245807e-07 1.572065e-06    propanoic Kruskal-Wallis (kru)
## m2_propanoic 2.640232e-05 3.080271e-05 m2_propanoic Kruskal-Wallis (kru)
## butanoic     2.509298e-05 3.080271e-05     butanoic Kruskal-Wallis (kru)
## m3_butanoic  5.597385e-06 1.555731e-05  m3_butanoic Kruskal-Wallis (kru)
## pentanoic    6.667417e-06 1.555731e-05    pentanoic Kruskal-Wallis (kru)
## hexanoic     1.537043e-05 2.689826e-05     hexanoic Kruskal-Wallis (kru)

7.1.1.1 Conclusions

Significant impact from treatment type was detected: ANOVA: Propanoic, butanoic, and hexanoic PERMANOVA: PFOS = no impact; VAN = All but m3_butanoic Kruskal: all but acetic acid

7.1.1.2 PERMANOVA and PCoA (visualization)

# Scaling SCFA data
scaled.SCFA <- scale(dat.SCFA, center = FALSE, scale = TRUE)

# Calculating Bray-Curtis PCoA with capscale
tmp2 <- capscale(as.matrix(scaled.SCFA) ~ 1, data = as.matrix(scaled.SCFA), distance = "bray", metaMDS = TRUE)

# Collect data for plotting
mds.samples <- data.frame(tmp2$CA$u)
mds.scfa <- data.frame(tmp2$CA$v)

# Prepare point zero and labels for arrows
mds.scfa$label1 <- row.names(mds.scfa)
mds.scfa$x <- 0
mds.scfa$y <- 0

# Rename labels
mds.scfa <- mds.scfa %>% mutate("label2" = case_when(label1 == "acetic" ~ "Acetate",
                                                     label1 == "propanoic" ~ "Propionate",
                                                     label1 == "m2_propanoic" ~ "Isobutyrate",
                                                     label1 == "butanoic" ~ "Butyrate",
                                                     label1 == "m3_butanoic" ~ "Isovalerate",
                                                     label1 == "pentanoic" ~ "Valerate",
                                                     label1 == "hexanoic" ~ "Caproate"))

# Bind with main data
dat.mds <- cbind(dat.clean, mds.samples)

# Create plot
p <- ggplot() +
  geom_point(data = dat.mds, mapping = aes(x = MDS1, y = MDS2, color = treatment)) +
  stat_ellipse(data = dat.mds, mapping = aes(x = MDS1, y = MDS2, color = treatment, fill = treatment), geom = "polygon", alpha = 0.1) +
  geom_segment(data = mds.scfa, mapping = aes(x=x, y=y, xend=0.35*MDS1, yend=0.35*MDS2), 
               lineend = "butt",
               linejoin = "round",
               size = 0.5,
               arrow = arrow(length = unit(0.3, 'cm'))) +
  geom_label_repel(data = mds.scfa, 
             mapping = aes(x = 0.35*MDS1, y = 0.35*MDS2), #
             label = mds.scfa$label2,
             size = 4,
             min.segment.length = 0,
             segment.alpha = 0.8,
             box.padding = 0.3,
             force = 1) +
  theme_pubr(legend = "top") +
  scale_color_manual(values = params$COL, name = "Treatment") +
  scale_fill_manual(values = params$COL) +
  labs(x = "Axis 1", y = "Axis 2") +
  guides(fill = "none")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p
# Remove legend
leg <- get_legend(p)
p2 <- p + theme(legend.position = "none")

# Add marginal boxplots
p3 <- ggExtra::ggMarginal(p = p2, type = "boxplot", size = 10, groupFill = TRUE)
p3

# Add legend back to plot
p4 <- plot_grid(leg, p3, ncol = 1, rel_heights = c(0.1,1), rel_widths = 1)
p4

# Save output
suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA_weighed.png", plot = p3, device = "png", dpi = 300, height = 140, width = 140, units = "mm"))
suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA_weighed.pdf", plot = p3, device = "pdf", dpi = 300, height = 140, width = 140, units = "mm"))
######################################
# # BRAY-CURTIS
# dist.bray <- vegdist(as.matrix(dat.SCFA), method = "bray")
# 
# # Ordination
# bray.pcoa <- pcoa(dist.bray)
# 
# 
# bray.df <- data.frame(pcoa1 = bray.pcoa$vectors[,1],
#                  pcoa2 = bray.pcoa$vectors[,2],
#                  pcoa3 = bray.pcoa$vectors[,3],
#                  pcoa4 = bray.pcoa$vectors[,4],
#                  pcoa5 = bray.pcoa$vectors[,5])
# 
# # Add metadata
# dat.pcoa <- cbind(bray.df,
#                   rat_name = dat.clean$rat_name,
#                   treatment = dat.clean$treatment,
#                   pfos = dat.clean$pfos,
#                   van = dat.clean$van)
# 
# # PERMANOVA
# adonis2(dist.bray ~ van*pfos, data = dat.clean)
# 
# # Create PCoA plot
# p.pcoa <- ggplot(dat.pcoa, aes(x = pcoa1, y = pcoa2, color = treatment)) +
#   geom_point() + 
#   theme_pubr(legend = "right") + 
#   stat_ellipse() +
#   scale_color_manual(values = params$COL, name = "Treatment") #name = "Group", labels = c("No fibre","No fibre + PFOS", "Fibre","Fibre + PFOS")
# #  scale_shape_manual(values = c(16,17), name = "Dissection day", labels = c("Day 8","Day 21"))
# p.pcoa
# p.pcoa2 <- p.pcoa +theme(legend.position = "none")
# 
# # Recover legend
# leg <- get_legend(p.pcoa)
# 
# # Add marginal boxplots
# p.pcoa3 <- ggExtra::ggMarginal(p = p.pcoa2, type = 'boxplot', size = 10, groupFill = TRUE)
# # Organize plot with legend
# p.pcoa4 <- plot_grid(leg, p.pcoa3, rel_widths = c(1,0.1))
# p.pcoa4
# 
# # Save output
# suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA.png", plot = p.pcoa3, device = "png", dpi = 300, height = 140, width = 140, units = "mm"))
# suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA.pdf", plot = p.pcoa3, device = "pdf", dpi = 300, height = 140, width = 140, units = "mm"))

7.2 Formic acid / Formate

7.2.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "formic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$formic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      formic      12 0.069 0.239
## 2 PFOS      formic      12 0.244 0.484
## 3 VAN       formic      12 0.12  0.283
## 4 VAN+PFOS  formic      12 0.298 0.466

7.2.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

7.2.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 6 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 PFOS      R31            19 yes   no     283.  284.  293.  301.  307.   315
## 4 PFOS      R32            20 yes   no     255.  262.  269.  276.  281    291
## 5 VAN       R14            26 no    yes    246.  256.  260.  267.  270.   274
## 6 VAN       R17            29 no    yes    268.  278.  274.  290.  295    295
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic      p.value
##   <chr>                <dbl>        <dbl>
## 1 residuals(model)     0.732 0.0000000500

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.922 0.438
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Formic acid data contains vary little data above Limit of Detection (= 0.6) with a total of 10 valid data points. There are no outliers, Shapiro-Wilk test shows no normality and Levene test shows equal variance. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment. ### Kruskal-Wallis test

7.2.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.        n statistic    df     p method        
## * <chr>  <int>     <dbl> <int> <dbl> <chr>         
## 1 formic    48      2.44     3 0.486 Kruskal-Wallis
7.2.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.        n effsize method  magnitude
## * <chr>  <int>   <dbl> <chr>   <ord>    
## 1 formic    48 -0.0127 eta2[H] small
7.2.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.    group1 group2      n1    n2 statistic     p p.adj p.adj.signif
## * <chr>  <chr>  <chr>    <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 formic CTRL   PFOS        12    12     0.986 0.324 0.648 ns          
## 2 formic CTRL   VAN         12    12     0.400 0.689 0.689 ns          
## 3 formic CTRL   VAN+PFOS    12    12     1.45  0.148 0.648 ns          
## 4 formic PFOS   VAN         12    12    -0.585 0.558 0.689 ns          
## 5 formic PFOS   VAN+PFOS    12    12     0.462 0.644 0.689 ns          
## 6 formic VAN    VAN+PFOS    12    12     1.05  0.295 0.648 ns

7.2.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM formate",limits = c(0,1.6),breaks = seq(0,1.6,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.6, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 22 rows containing missing values (`geom_point()`).

p.formic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.formic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 22 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 22 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.3 Acetic acid / Acetate

7.3.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "acetic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(OUTCOME) & !dat$rat_name == "R08")

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      acetic      11  7.84 1.23 
## 2 PFOS      acetic      12  7.19 2.49 
## 3 VAN       acetic      12  3.02 0.674
## 4 VAN+PFOS  acetic      12  3.29 1.04

7.3.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

7.3.3 Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 1 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R10            10 no    no     266.  273.  275.  285.  291.   294
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, where one is extreme (R08). This outlier has been removed from the analysis.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.966   0.189

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    43      8.35 0.000174
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Most data for this set is above Limit of Detection (= 0.5). This shows that SCFA concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test show normality but the data has unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.3.4 Kruskal-Wallis test

7.3.4.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.        n statistic    df           p method        
## * <chr>  <int>     <dbl> <int>       <dbl> <chr>         
## 1 acetic    47      31.4     3 0.000000692 Kruskal-Wallis
7.3.4.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.        n effsize method  magnitude
## * <chr>  <int>   <dbl> <chr>   <ord>    
## 1 acetic    47   0.661 eta2[H] large
7.3.4.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.    group1 group2      n1    n2 statistic         p     p.adj p.adj.signif
## * <chr>  <chr>  <chr>    <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 acetic CTRL   PFOS        11    12    -0.478 0.633     0.743     ns          
## 2 acetic CTRL   VAN         11    12    -4.31  0.0000165 0.0000992 ****        
## 3 acetic CTRL   VAN+PFOS    11    12    -3.99  0.0000670 0.000181  ***         
## 4 acetic PFOS   VAN         12    12    -3.92  0.0000903 0.000181  ***         
## 5 acetic PFOS   VAN+PFOS    12    12    -3.59  0.000333  0.000500  ***         
## 6 acetic VAN    VAN+PFOS    12    12     0.328 0.743     0.743     ns

7.3.4.1 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM acetate",limits = c(0,15),breaks = seq(0,15,2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment")  +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "#2f2f2f")

p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(14,15,12,13))
p

p.acetic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.acetic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
PFOS amount in mg at day 4 and 8
PFOS amount in mg at day 4 and 8

7.4 Propanoic acid / Propionate

7.4.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$propanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      propanoic    12 0.268 0.113
## 2 PFOS      propanoic    12 0.265 0.095
## 3 VAN       propanoic    12 0.264 0.086
## 4 VAN+PFOS  propanoic    12 0.325 0.102

7.4.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 6 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 VAN       R15            27 no    yes    268.  277.  283.  290.  296.   300
## 4 VAN       R18            30 no    yes    266.  275.  282.  285.  288.   298
## 5 VAN       R22            34 no    yes    292.  296.  301.  313.  311.   321
## 6 VAN+PFOS  R43            43 yes   yes    292.  301.  300.  313.  316.   322
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains five outliers, where none are extreme. As removing outliers does not affect final outcome they are left in the analysis.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.980   0.586

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.280 0.840
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that propanoic acid concentration has 5 outliers, Shapiro-Wilk test shows normality and Levene test shows equal variance. Therefore we will use one-way ANOVA test with Tukey’s honest significance test. ### ANOVA One-Way test

7.4.2.1 Perform test

If we had equality of variance we can now run a one-way ANOVA tests anova_test() (if we have equal variance) or a welch_anova_test() (if variance vary).

if(EQUAL.VAR) {
  res.aov <- dat.clean %>% anova_test(FORMULA)
  res.aov
} else {
  res.aov <- dat.clean %>% welch_anova_test(FORMULA)
  res.aov
}
## ANOVA Table (type II tests)
## 
##      Effect DFn DFd     F     p p<.05   ges
## 1 treatment   3  44 1.068 0.372       0.068

7.4.2.2 Perform posthoc test

A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.

if(EQUAL.VAR) {
  pwc <- dat.clean %>% tukey_hsd(FORMULA)
  pwc
} else {
  pwc <- dat.clean %>% games_howell_test(FORMULA)
  pwc
}
## # A tibble: 6 × 9
##   term   group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr>  <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 treat… CTRL   PFOS            0 -0.00295  -0.111      0.105 1     ns          
## 2 treat… CTRL   VAN             0 -0.00466  -0.113      0.104 0.999 ns          
## 3 treat… CTRL   VAN+P…          0  0.0566   -0.0516     0.165 0.508 ns          
## 4 treat… PFOS   VAN             0 -0.00171  -0.110      0.107 1     ns          
## 5 treat… PFOS   VAN+P…          0  0.0596   -0.0487     0.168 0.464 ns          
## 6 treat… VAN    VAN+P…          0  0.0613   -0.0470     0.170 0.44  ns

7.4.3 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM propionate",limits = c(0,0.6),breaks = seq(0,0.6,0.2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 1 rows containing missing values (`geom_point()`).

p.propanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.propanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.5 2-methyl-Propanoic acid / Isobutyrate

7.5.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m2_propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m2_propanoic)) # & !dat$rat_name == "R01")

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable         n  mean    sd
##   <chr>     <fct>        <dbl> <dbl> <dbl>
## 1 CTRL      m2_propanoic    12 0.049 0.025
## 2 PFOS      m2_propanoic    12 0.05  0.017
## 3 VAN       m2_propanoic    12 0.005 0.018
## 4 VAN+PFOS  m2_propanoic    12 0.01  0.027

7.5.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 CTRL      R08             8 no    no     256.  265.  268.  274.  278.   283
## 3 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 4 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## 5 VAN+PFOS  R47            47 yes   yes    242.  249.  255.  263.  267.   271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains one outliers, where one is extreme (R01). This outlier has been removed from the analysis -> leading to new outlier but not extreme.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic     p.value
##   <chr>                <dbl>       <dbl>
## 1 residuals(model)     0.755 0.000000145

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.633 0.598
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 2-methyl-propanoic acid concentration has five outliers of which four are extreme outlier - these arise from several samples = 0, and based on this they will be left in. Shapiro-Wilk test show normality but the data has unequal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.5.3 Kruskal-Wallis test

7.5.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.              n statistic    df          p method        
## * <chr>        <int>     <dbl> <int>      <dbl> <chr>         
## 1 m2_propanoic    48      26.2     3 0.00000882 Kruskal-Wallis
7.5.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.              n effsize method  magnitude
## * <chr>        <int>   <dbl> <chr>   <ord>    
## 1 m2_propanoic    48   0.526 eta2[H] large
7.5.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.          group1 group2     n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>        <chr>  <chr>   <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 m2_propanoic CTRL   PFOS       12    12    0.0383 9.69e-1 9.69e-1 ns          
## 2 m2_propanoic CTRL   VAN        12    12   -3.73   1.94e-4 5.82e-4 ***         
## 3 m2_propanoic CTRL   VAN+PF…    12    12   -3.46   5.44e-4 8.15e-4 ***         
## 4 m2_propanoic PFOS   VAN        12    12   -3.76   1.67e-4 5.82e-4 ***         
## 5 m2_propanoic PFOS   VAN+PF…    12    12   -3.50   4.71e-4 8.15e-4 ***         
## 6 m2_propanoic VAN    VAN+PF…    12    12    0.268  7.88e-1 9.46e-1 ns

7.5.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isobutyrate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.102,0.11,0.096,0.088))
p
## Warning: Removed 13 rows containing missing values (`geom_point()`).

p.m2p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m2p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 13 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 13 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.6 Butanoic acid / Butyrate

7.6.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$butanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      butanoic    12 1.28  0.655
## 2 PFOS      butanoic    12 1.14  0.625
## 3 VAN       butanoic    12 0.187 0.195
## 4 VAN+PFOS  butanoic    12 0.193 0.109

7.6.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R20            32 no    yes    285   293.  301.  310.  317.   321
## 2 VAN+PFOS  R48            48 yes   yes    224.  229.  234.  239.  242.   250
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, where one is extreme (R20). This outlier does not affect the final results or type of analysis and has been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.938  0.0130

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    44      7.69 0.000309
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

Most data for this set is above Limit of Detection (= 0.03). This shows that butanioc acid concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.6.3 Kruskal-Wallis test

7.6.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.          n statistic    df         p method        
## * <chr>    <int>     <dbl> <int>     <dbl> <chr>         
## 1 butanoic    48      27.4     3 0.0000048 Kruskal-Wallis
7.6.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.          n effsize method  magnitude
## * <chr>    <int>   <dbl> <chr>   <ord>    
## 1 butanoic    48   0.555 eta2[H] large
7.6.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.      group1 group2      n1    n2 statistic         p    p.adj p.adj.signif
## * <chr>    <chr>  <chr>    <int> <int>     <dbl>     <dbl>    <dbl> <chr>       
## 1 butanoic CTRL   PFOS        12    12   -0.0729 0.942     0.942    ns          
## 2 butanoic CTRL   VAN         12    12   -3.95   0.0000777 0.000315 ***         
## 3 butanoic CTRL   VAN+PFOS    12    12   -3.50   0.000467  0.000918 ***         
## 4 butanoic PFOS   VAN         12    12   -3.88   0.000105  0.000315 ***         
## 5 butanoic PFOS   VAN+PFOS    12    12   -3.43   0.000612  0.000918 ***         
## 6 butanoic VAN    VAN+PFOS    12    12    0.452  0.651     0.782    ns

7.6.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM butyrate",limits = c(0,3),breaks = seq(0,3,0.5)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.6,3,2.4,2.8))
p
## Warning: Removed 1 rows containing missing values (`geom_point()`).

p.butanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.butanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.7 3-methyl-Butanoic acid / Isovalerate

7.7.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m3_butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m3_butanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable        n  mean    sd
##   <chr>     <fct>       <dbl> <dbl> <dbl>
## 1 CTRL      m3_butanoic    12 0.03  0.017
## 2 PFOS      m3_butanoic    12 0.033 0.012
## 3 VAN       m3_butanoic    12 0.005 0.018
## 4 VAN+PFOS  m3_butanoic    12 0.006 0.022

7.7.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R01             1 no    no     310   321.  325.  339.  350    354
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic       p.value
##   <chr>                <dbl>         <dbl>
## 1 residuals(model)     0.678 0.00000000551

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.391 0.760
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 3-methyl-butanoic acid concentration has three outliers, which has been left in. Shapiro-Wilk test show no normality but the data has equal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.7.3 Kruskal-Wallis test

7.7.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.             n statistic    df         p method        
## * <chr>       <int>     <dbl> <int>     <dbl> <chr>         
## 1 m3_butanoic    48      25.6     3 0.0000117 Kruskal-Wallis
7.7.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.             n effsize method  magnitude
## * <chr>       <int>   <dbl> <chr>   <ord>    
## 1 m3_butanoic    48   0.513 eta2[H] large
7.7.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.         group1 group2      n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>       <chr>  <chr>    <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 m3_butanoic CTRL   PFOS        12    12    0.556  5.78e-1 6.94e-1 ns          
## 2 m3_butanoic CTRL   VAN         12    12   -3.29   9.96e-4 1.67e-3 **          
## 3 m3_butanoic CTRL   VAN+PFOS    12    12   -3.26   1.11e-3 1.67e-3 **          
## 4 m3_butanoic PFOS   VAN         12    12   -3.85   1.19e-4 4.05e-4 ***         
## 5 m3_butanoic PFOS   VAN+PFOS    12    12   -3.82   1.35e-4 4.05e-4 ***         
## 6 m3_butanoic VAN    VAN+PFOS    12    12    0.0309 9.75e-1 9.75e-1 ns

7.7.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isovalerate",limits = c(0,0.1),breaks = seq(0,0.1,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.092,0.1,0.084,0.076))
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.m3b <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m3b,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.8 Pentanoic acid / Valerate

7.8.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$pentanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      pentanoic    12 0.047 0.023
## 2 PFOS      pentanoic    12 0.048 0.025
## 3 VAN       pentanoic    12 0.004 0.015
## 4 VAN+PFOS  pentanoic    12 0.007 0.026

7.8.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R26            14 yes   no     238.  246.  248   257   259.   265
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic    p.value
##   <chr>                <dbl>      <dbl>
## 1 residuals(model)     0.820 0.00000372

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44      1.72 0.177
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that pentanoic acid concentration has one outlier, which has been left in for the analysis. Shapiro-Wilk test shows no normality and Levene test shows equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.8.3 Kruskal-Wallis test

7.8.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.           n statistic    df          p method        
## * <chr>     <int>     <dbl> <int>      <dbl> <chr>         
## 1 pentanoic    48      27.3     3 0.00000509 Kruskal-Wallis
7.8.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.           n effsize method  magnitude
## * <chr>     <int>   <dbl> <chr>   <ord>    
## 1 pentanoic    48   0.552 eta2[H] large
7.8.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.       group1 group2      n1    n2 statistic        p    p.adj p.adj.signif
## * <chr>     <chr>  <chr>    <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
## 1 pentanoic CTRL   PFOS        12    12    0.0155 0.988    0.988    ns          
## 2 pentanoic CTRL   VAN         12    12   -3.76   0.000173 0.000448 ***         
## 3 pentanoic CTRL   VAN+PFOS    12    12   -3.62   0.000299 0.000448 ***         
## 4 pentanoic PFOS   VAN         12    12   -3.77   0.000163 0.000448 ***         
## 5 pentanoic PFOS   VAN+PFOS    12    12   -3.63   0.000282 0.000448 ***         
## 6 pentanoic VAN    VAN+PFOS    12    12    0.139  0.889    0.988    ns

7.8.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM valerate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.086,0.094,0.11,0.102))
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.pentanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.pentanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.9 4-methyl-Pentanoic acid / Isocaproate

7.9.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m4_pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m4_pentanoic))# & !dat$rat_name %in% c("R42","R45"))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable         n  mean    sd
##   <chr>     <fct>        <dbl> <dbl> <dbl>
## 1 CTRL      m4_pentanoic    12 0.018 0.023
## 2 PFOS      m4_pentanoic    12 0.03  0.058
## 3 VAN       m4_pentanoic    12 0.035 0.074
## 4 VAN+PFOS  m4_pentanoic    12 0.216 0.513

7.9.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

#### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 5 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS      R25            13 yes   no     339.  340.  353.  364.  348.   358
## 2 PFOS      R28            16 yes   no     242.  248.  252.  265.  268.   273
## 3 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 4 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## 5 VAN+PFOS  R45            45 yes   yes    234.  239.  244.  253.  262.   263
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains five outliers, however, these outliers arise several data points being under Limit of Detection (=0.03). These data points are therefore kept.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.461 4.94e-12

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44      1.62 0.198
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that 4-methyl-pentanoic acid concentration has five outliers and only few data points are above Limit of Detection (= 0.03) - outliers are therefore kept. Shapiro-Wilk test show no normality but the data has equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.9.3 Kruskal-Wallis test

7.9.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.              n statistic    df     p method        
## * <chr>        <int>     <dbl> <int> <dbl> <chr>         
## 1 m4_pentanoic    48      1.32     3 0.724 Kruskal-Wallis
7.9.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.              n effsize method  magnitude
## * <chr>        <int>   <dbl> <chr>   <ord>    
## 1 m4_pentanoic    48 -0.0381 eta2[H] small
7.9.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.          group1 group2      n1    n2 statistic     p p.adj p.adj.signif
## * <chr>        <chr>  <chr>    <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 m4_pentanoic CTRL   PFOS        12    12   -0.165  0.869 0.987 ns          
## 2 m4_pentanoic CTRL   VAN         12    12    0.0165 0.987 0.987 ns          
## 3 m4_pentanoic CTRL   VAN+PFOS    12    12    0.875  0.381 0.781 ns          
## 4 m4_pentanoic PFOS   VAN         12    12    0.182  0.856 0.987 ns          
## 5 m4_pentanoic PFOS   VAN+PFOS    12    12    1.04   0.298 0.781 ns          
## 6 m4_pentanoic VAN    VAN+PFOS    12    12    0.859  0.391 0.781 ns

7.9.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM isocaproate",limits = c(0,1.78),breaks = seq(0,1.78,0.2)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p
## Warning: Removed 14 rows containing missing values (`geom_point()`).

p.m4p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m4p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.10 Hexanoic acid / Caproate

7.10.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "hexanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$hexanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 CTRL      hexanoic    12 0.053 0.041
## 2 PFOS      hexanoic    12 0.07  0.058
## 3 VAN       hexanoic    12 0.004 0.014
## 4 VAN+PFOS  hexanoic    12 0.006 0.021

7.10.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 2 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 2 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Data contains two outliers, both arising due to majority of data points in the vancomycin treated groups are below Limit of Detection (=0.01). Therefore these outliers are left in.

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic   p.value
##   <chr>                <dbl>     <dbl>
## 1 residuals(model)     0.874 0.0000992

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1     3    44      6.86 0.000687
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that hexanoic acid concentration has two outliers and several datapoint below Limit of Detection. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.

7.10.3 Kruskal-Wallis test

7.10.3.0.1 Perform test
res.aov <- dat.clean %>% kruskal_test(FORMULA)
res.aov
## # A tibble: 1 × 6
##   .y.          n statistic    df          p method        
## * <chr>    <int>     <dbl> <int>      <dbl> <chr>         
## 1 hexanoic    48      28.0     3 0.00000356 Kruskal-Wallis
7.10.3.0.2 Effect size

The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).

dat.clean %>% kruskal_effsize(FORMULA)
## # A tibble: 1 × 5
##   .y.          n effsize method  magnitude
## * <chr>    <int>   <dbl> <chr>   <ord>    
## 1 hexanoic    48   0.569 eta2[H] large
7.10.3.0.3 Post-hoc test if interaction is significant

A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.

# pairwise comparisons
pwc <- dat.clean %>% 
  dunn_test(FORMULA, p.adjust.method = "fdr") 
pwc
## # A tibble: 6 × 9
##   .y.      group1 group2      n1    n2 statistic         p    p.adj p.adj.signif
## * <chr>    <chr>  <chr>    <int> <int>     <dbl>     <dbl>    <dbl> <chr>       
## 1 hexanoic CTRL   PFOS        12    12    0.717  0.473     0.568    ns          
## 2 hexanoic CTRL   VAN         12    12   -3.39   0.000699  0.00139  **          
## 3 hexanoic CTRL   VAN+PFOS    12    12   -3.31   0.000927  0.00139  **          
## 4 hexanoic PFOS   VAN         12    12   -4.11   0.0000401 0.000168 ***         
## 5 hexanoic PFOS   VAN+PFOS    12    12   -4.03   0.0000560 0.000168 ***         
## 6 hexanoic VAN    VAN+PFOS    12    12    0.0779 0.938     0.938    ns

7.10.4 Create figure

## Prepare statistical information:
pwc.adj <- pwc %>% 
  add_x_position(x = PREDICTOR) %>%
  p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)

# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
  stat.sig <- pwc.adj %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
  stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
    add_y_position(step.increase = 0.25) %>%
    mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM caproate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p
## Warning: Removed 15 rows containing missing values (`geom_point()`).

p.hexanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.hexanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 15 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 15 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.11 Heptanoic acid / Enanthate

7.11.1 Prepare data

# load data 
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")

# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "heptanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"

# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$heptanoic))

# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))

# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")
## # A tibble: 4 × 5
##   treatment variable      n  mean    sd
##   <chr>     <fct>     <dbl> <dbl> <dbl>
## 1 CTRL      heptanoic    12 0.004 0.012
## 2 PFOS      heptanoic    12 0     0    
## 3 VAN       heptanoic    12 0.009 0.033
## 4 VAN+PFOS  heptanoic    12 0.008 0.028

7.11.2 Visualise

Create a boxplot of the data.

# Create plot
bxp <- dat.clean %>%
  ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
            y = OUTCOME,
            color = PREDICTOR[1],
            facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
            palette = params$COL)
bxp

### Assumptions and preliminary tests

The ANOVA tests assume the following characteristics about the data:

  • Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
    This is already done for the whole project

  • No significant outliers in the two groups

  • Normality. the data for each group should be approximately normally distributed.

  • Homogeneity of variances. the variance of the outcome variable should be equal in each group.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

Identify outliers
Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

# Test for outliers
dat.clean %>% 
  group_by(across(all_of(PREDICTOR))) %>% 
  identify_outliers(!!sym(OUTCOME))
## # A tibble: 3 × 49
##   treatment rat_name ordering pfos  van    bw_0  bw_1  bw_2  bw_3  bw_4  bw_5
##   <chr>     <chr>       <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL      R10            10 no    no     266.  273.  275.  285.  291.   294
## 2 VAN       R19            31 no    yes    256.  263.  269   279.  282.   287
## 3 VAN+PFOS  R42            42 yes   yes    240.  244.  251.  260   264.   272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## #   cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## #   liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## #   pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## #   pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## #   pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## #   pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …

Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model residuals.

# Build the linear model
model  <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))

# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
## # A tibble: 1 × 3
##   variable         statistic  p.value
##   <chr>                <dbl>    <dbl>
## 1 residuals(model)     0.399 9.66e-13

Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity of variances.

plot(model, 1)

  1. It’s also possible to use the Levene’s test to check the homogeneity of variances:
dat.clean %>% levene_test(FORMULA)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3    44     0.446 0.722
# Save result
EQUAL.VAR <- dat.clean %>% levene_test(FORMULA) %>% pull(p) > 0.05
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

This shows that heptanoic acid concentration only three data points above Limit of Detection (= 0.03) which is deemed too low for analysis. No final analysis therefore made.

7.11.3 Create figure

#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
          fill = PREDICTOR,
          add =  "jitter",
          add.params = list(size = 1)) +
  scale_fill_manual(values = params$COL) +
  scale_y_continuous(name = "mM enanthate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
  labs(fill = "Treatment") +
  scale_x_discrete(name = "Treatment") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) +
  geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
  
#p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p
## Warning: Removed 25 rows containing missing values (`geom_point()`).

p.heptanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.heptanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))

# Plot for saving without legend
p2 <- p + theme(legend.position = "none")

ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 25 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
## Warning: Removed 25 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.
SCFA in mM
SCFA in mM

7.12 SCFA ggarrange

Here all plots from the SCFA analysis is combined into one figure.

params <- readRDS("R_objects/animal_params.RDS")

# Load rdata files with scfa plots
pfiles <- list.files(path = "R_objects/scfa/", pattern = "*.rdata", full.names = TRUE)
lapply(pfiles, load,.GlobalEnv)
## [[1]]
## [1] "p.acetic"
## 
## [[2]]
## [1] "p.butanoic"
## 
## [[3]]
## [1] "p.formic"
## 
## [[4]]
## [1] "p.heptanoic"
## 
## [[5]]
## [1] "p.hexanoic"
## 
## [[6]]
## [1] "p.m2p"
## 
## [[7]]
## [1] "p.m3b"
## 
## [[8]]
## [1] "p.m4p"
## 
## [[9]]
## [1] "p.pentanoic"
## 
## [[10]]
## [1] "p.propanoic"
# Create plot
p.all <- ggarrange(p.formic,p.acetic,p.propanoic,p.m2p,p.butanoic,p.m3b,p.pentanoic,p.m4p,p.hexanoic,p.heptanoic,
                   ncol = 5, nrow = 2, 
                   common.legend = TRUE,
                   legend = "top",
                   label.x = 0,
                   font.label = list(size = 24, face = "bold"),
                   labels = c("A","B","C","D","E","F","G","H","I","J"),
                   align = "hv")
## Warning: Removed 22 rows containing missing values (`geom_point()`).
## Removed 22 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 25 rows containing missing values (`geom_point()`).
p.all

# Save graphics
ggsave(filename = "plots/animal_data/scfa/all.png", p.all, device = "png", dpi = 300, height = 200, width = 400, units = "mm")
ggsave(filename = "plots/animal_data/scfa/all.pdf", p.all, device = "pdf", dpi = 300, height = 200, width = 400, units = "mm")

# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.

8 SETTINGS

Overview of the parameters and packages that were used for this analysis

8.1 PARAMETERS

The following paramenters were set in for this analysis:

params <- readRDS("R_objects/import_params.RDS")

tmp <- unlist(params)
dat <- data.frame(Parameter = names(tmp), Value = unname(tmp))


kbl(dat, row.names = F) %>% kable_classic(lightable_options = "striped")

8.2 PACKAGES

The analysis was run in R version 4.2.2 using the following packages:

pack <- data.frame(Package = (.packages()))

for (i in seq(nrow(pack))) pack$Version[i] <- as.character(packageVersion(pack$Package[i]))

kbl(pack[order(pack$Package),], row.names = F) %>% kable_classic(lightable_options = "striped")   
Package Version
ape 5.7.1
base 4.2.2
cowplot 1.1.1
datasets 4.2.2
DAtest 2.8.0
decontam 1.18.0
dplyr 1.1.0
forcats 1.0.0
ggbreak 0.1.1
ggplot2 3.4.2
ggpubr 0.6.0
ggrepel 0.9.3
graphics 4.2.2
grDevices 4.2.2
kableExtra 1.3.4
lattice 0.20.45
lubridate 1.9.2
methods 4.2.2
pals 1.7
permute 0.9.7
phangorn 2.11.1
pheatmap 1.0.12
phyloseq 1.42.0
plotly 4.10.1
purrr 1.0.1
readr 2.1.4
rstatix 0.7.2
stats 4.2.2
stringr 1.5.0
tibble 3.1.8
tidyr 1.3.0
tidyverse 2.0.0
utils 4.2.2
vegan 2.6.4